Jan 19, 2014 1 min read django

Speed-up easy-thumbnails on S3

I was developing a small app as a playground and confidence builder, choosing django, heroku and S3 as resources. One of the packages used is easy-thumbnails.

Behaviour

I chose to use the easy-thumbnails app with S3, a possibility granted by the storages framework. At a glance, everything works OK; thumbnails are created and rendered accordingly. However, on a thumbnail-heavy page, I (and my profiling app) noticed significant load times (3s for a page with 7 thumbnails).

Drilling down, it became evident that something was checking the S3 bucket for each thumbnail (a httpplib call occurred for each one).

Cause

TL;DR: If you use the default ImageField, the thumbnail storage is created by the Thumbnailer framework. Each time the storage object is created, it checks for the bucket's existence. This means a query to S3. Lots of images equals lots of queries to S3, checking for a bucket's existence.

Logic goes something like this:

In thumbnail.py:
```
get_thumbnailer(source)[alias]
```
In easy_thumbnails.files.py executing line 52 results in creating a new ThumbnailerFieldFile() object
easy_thumbnails.fields.py has a ThumbnailerField() object created, which uses thumbnail_storage parameter. This is empty in the above call, which results later in creating the storge object

In easy_thumbnails.files.py, the Thumbnailer class constructor contains:

if not thumbnail_storage:
    thumbnail_storage = get_storage_class(
        settings.THUMBNAIL_DEFAULT_STORAGE)()

Solution

I have thought of two options:

Either tweak the easy_thumbnails.files.py, the Thumbnailer class constructor or
Try to find a way not to re-create the storage object

I ended up using in my model something like:

source_image = ThumbnailerImageField(
    blank=True, null=True,
    upload_to=get_file_path,
    storage=picture_log_storage,
    thumbnail_storage=thumbnail_storage)

Where the thumbnail_storage is initialised only once at the beginning of the .models file:

thumbnail_storage = get_storage_class(settings.THUMBNAIL_DEFAULT_STORAGE)()

This way, the storage object is created only once per Django instance and the bucket existence is also queried only once.

Results

Now, a page query takes about 120ms on average instead of 3s. Wow. Much speedup.

Behaviour

Cause

Solution

Results

You might also like...

Send JSON Graph Data from Django

Delete Executions in Rundeck - Wrapping it all Up

Remove Executions in Rundeck

List Rundeck Jobs for a Project

List Rundeck Projects

Popular tags