Hacking GNU MediaGoblin for blog image hosting

I recently managed to hack GNU MediaGoblin in such a way that it is now really easy to use with my current blogging setup. Let's see how all of this works!

Sommaire

My blog setup
- Struggling with image hosting
Playing with MediaGoblin's API
- The public_id indirection
Creating a Pelican plugin
Conclusion

My blog setup

My current blog is using the awesome Pelican static website generator. It is written in Python (a langage that I love) and is very fun to play with.

For hosting images I have been never satisfied with my setup. First I was uploading images directly inside the git repository. This caused the git repository to grow and grow on each image modification, as it was keeping the whole history for images.

Then I moved to Git LFS, that is intented for effective binary storage on top of git itself. I created a whole new git repository from scratch, loosing history of the blog code, but using Git LFS from start so that the git repository stays small over time. However I sometimes encounter Git LFS errors when cloning or adding new images. I am not sure if this is because of my current Git setup or because of bad usage of mine.

So I wanted to try a third option while keeping Git LFS for the existing images. This is why I installed my own GNU MediaGoblin instance and started doing tests over it, especially on its HTTP REST API.

Playing with MediaGoblin's API

After a lot of tests using the Python requests library and the requests-oauth overlay I managed to speak to the GMG API. It was not that easy, especially because the OAuth1 workflow is quite cumbersome.

The public_id indirection

There is an indirection that was hard to understand for me at first: accessing the images using the API is only possible if you know the image's public_id. It is an url with an UUID inside looking like this:

https://media.microjoe.org/api/image/c93bc9ea-359e-4a33-9df8-30694f1e18b1

This is totally different from the static image ID which you can deduce when browsing the GMG gallery, which may look like this:

https://media.microjoe.org/mgoblin_media/media_entries/41/lcov.png

I think there is no way to deduce the image's public_id from the regular database ID that is used in this second URL. So the ways to find this URL that I found are the following:

Upon image upload using the API, you get the image's public_id (but what if you upload the image using the web interface?) ;
Using a for loop on all images with the API, you can deduce the public_id (this is a kind of bruteforce approach) ;
Looking right into GMG's SQlite database to find out what is the public_id of an image.

I was not satisfied with any of these solutions, so I wrote a quick'n'dirty patch (that has no chance to be merged as this) that shows me the image's public_id right on the bottom image page. See for example this one: https://media.microjoe.org/u/microjoe/m/4-preview-jpg/.

So now I can directly find the correct image URL for images that I upload using the web interface. Awesome. With this public_id we can do a lot of things. Let's go create an awesome Pelican plugin that can use this API for great good!

Warning: as discussing with the GMG folks, the current API is going to be rewritten soon to implement the ActivityPub API format. This current API hacking may not work in the coming months and maybe something better is designed, so that I do not have to display public_id on all my pages. This is why I will not detail the GMG hack any further.

Creating a Pelican plugin

It is now the time to write a Pelican plugin (in Python) in order to interact with our mediagoblin media server.

An inspiration from a Flickr plugin

I wanted to have a really simple Pelican plugin that at least can automatically retrieve the image URL (which can vary a lot in GMG because it tries to keep the original filename) from the public_id that we now have.

I wrote a fast prototype and named it gmg, in a lack of inspiration and in order to start to code fast. It was almost copied from a plugin we use for our hackerspace's website named HAUM and that originally interacts with Flickr.

Plugin features

It can be found on my Github with the name pelican-gmg. I damn hate Github, but my current git hosting is going to be migrated back to cgit anytime soon, so…

The main features are:

Retrieve an image from its GMG's public_id using OAuth1 stuff
Store all image metadata (such as image URL, title, description) into a pickle dict cache (because my GMG is very slow and will take 500ms to answer one API call)
Create a figure where the gmg tag appears

The plugin does not only use the image URL, but also all the GMG metadata in order to generate figures:

The title for the alt attribute of the image
The description for the figure caption
The sized-down image for inserting into the article
The full resolution image as a link when clicking on the image

Plugin example

Here is an example. You can see the media entry that I created for this example at this address: https://media.microjoe.org/u/microjoe/m/image-of-a-goblin-holding-a-laptop/

Since I have a custom version of GMG, you should be able to see the API URL below the image description. It is not very beautiful but it works.

Next in this reST article I am going to insert the following line:

[gmg: id=bc77f0e0-8af6-462c-acf7-9ed606134034]

Note that we reuse the public_id from the full URL. Also note that I had to add a space after the gmg: because the regex is not very smart and will also match this block in code block. Below is the result if I remove the space and the regex is properly matched:

Hello, GNU MediaGoblin!

You can see that the alt attribute of the image is the GMG's title, the caption is GMG's description, and if you click on the image you get the full resolution version. The plugin is smart enough to display the full resolution image if no thumbnail was generated by GMG because the image was small enough.

Conclusion

Here we are! This article was written in a hurry to share my experiments with the GMG community (hence the English article on my almost French-only blog).

I fear that with the new GMG API this work will be lost and I'll have to ensure backward compatibility, so that all my tags in my blog still work the same. Let's hope that the GMG team keeps the public_id in the database, so that all these GMG blocks will still be able to query the new API somehow.