Improve S3Boto Django-Storages Performance with Custom Settings
3 min read

Improve S3Boto Django-Storages Performance with Custom Settings

Improve S3Boto Django-Storages Performance with Custom Settings

Django-storages is a great piece of software. It provides an uniform API for all kind of file storage backends, including Amazon's S3. I use it for storing assets (the /static/ and /media/) for my django app. However, particularly with the /static/ assets, I've noticed firebug/yslow complaining of performance, so I've decided to dig in more into settings.

First, there are some nice articles to help you deciding what you need to improve performance (you can find even more with a simple search). These provided some hints where to start from.

S3Boto Settings

To my surprise, I found that there are quite a lot of settings in S3Boto's backend for django storages. You can tweak almost everything you want for your files. At the time of writing this article, here are the available settings.

A first batch of settings is used to establish the communication to the server:

  • AWS API keys. You need those to connect to S3.
    • AWS_S3_ACCESS_KEY_ID or AWS_ACCESS_KEY_ID
    • AWS_S3_SECRET_ACCESS_KEY or AWS_SECRET_ACCESS_KEY
  • AWS_S3_HOST (default S3Connection.DefaultHost) can be used to customise the S3 host. You probably don't need to change anything here.
  • AWS_S3_USE_SSL (default True) use SSL when communicating with S3.
  • AWS_S3_PORT (default None) allows for possible customisation of the port when communicating with S3.
  • AWS_QUERYSTRING_AUTH (default True) is useful for giving HTTP or browser access to resources that would normally require authentication. See here.
  • AWS_QUERYSTRING_EXPIRE (default 3600) defines a timeout in seconds for the AWS_QUERYSTRING_AUTH above.
  • AWS_S3_CALLING_FORMAT (default SubdomainCallingFormat()) defines the calling format for S3 - This is a S3Boto component
  • AWS_S3_ENCRYPTION (default False) sets the S3 server-side encryption. If True, the object will be stored encrypted on S3.

A second batch of settings is used to specify file properties:

  • AWS_REDUCED_REDUNDANCY (default False) specifies if file uses the RR feature of S3

  • AWS_S3_FILE_OVERWRITE (default True) used to specify if files are overwritten or not on update. The mechanism is used by the method that retrieves a name; if overwrite is enabled, the file name is the same as the original. Otherwise, a new name is retrieved from S3Boto

  • AWS_STORAGE_BUCKET_NAME (default None) defines the bucket name. This is required to identify the bucket where files are stored.

  • AWS_DEFAULT_ACL (default 'public-read') defines per file ACL. The accepted values seem to correcpons to the Canned ACLs from S3

  • AWS_HEADERS (default {}) defines the HTTP headers returned by S3 for a file.

  • AWS_IS_GZIPPED (default False) creates gzipped versions of the file on S3. It's used in conjunction with the GZIP_CONTENT_TYPES.

  • Following defines the types available to gzip:

    GZIP_CONTENT_TYPES`(default (`'text/css'`,`'application/javascript'`,`'application/x-javascript'`,
    ))
    
  • AWS_S3_FILE_NAME_CHARSET (default 'utf-8') specifies the charset for file names. You can change it, but UTF covers most characters.

  • AWS_PRELOAD_METADATA (default False) allows for caching some information for a buckets' files

Another group is used for storage backend API (e.g. helpers, generating the URL):

  • AWS_AUTO_CREATE_BUCKET (default False)is a helper to avoid setup errors. If set to True, the bucket will be automatically created. One can imagagine an usage to allow for dynamic bucket creation (e.g. sharding or per-use bucket). It's also nice to be similar to the get_or_create API in the ORM.
  • AWS_S3_URL_PROTOCOL (default 'http:') defines the protocol used to access the files. Default is HTTP, but you can set it to HTTPS if necessary.
  • AWS_LOCATION (default '') is used to normalise the URLs. One can use it do prepend a specific path like /media/my_app/images/ to the file name. Otherwise, the path name will be relative to the root of the bucket
  • AWS_S3_CUSTOM_DOMAIN (default None) can allow for a custom domain to be used in accessing the file from S3 (e.g. static.currio.com instead of static.currio.com.s3...com)
  • AWS_S3_SECURE_URLS (default True) is used in URL building and specifies HTTPS as protocol.

Purpose

Well, the point is that such settings can be tweaked, so the performance improves (better caching, gzipped files...). My settings are something similar to this:

from storages.backends.s3boto import S3BotoStorage

StaticRootS3BotoStorage = lambda: S3BotoStorage(
    bucket="static-bucket",
    reduced_redundancy=True,
    headers={
        'Cache-Control': 'max-age=2592000',
    }
)
MediaRootS3BotoStorage = lambda: S3BotoStorage(bucket="media-bucket")

This variant allows for the static files to use the reduced redundancy (cheaper) and have the Cache-Control header enabled. I should add the gzip too. It also differentiates the static and media files in different buckets.

The lambda expression takes advantage of the implementation where all the above settings have their own variables which can be populated from the constructor. This approach allows a per-storage parameters; defining the above as django settings would result in same settings for all S3 storages.

The S3-specific settings look like this:

import os

AWS_ACCESS_KEY_ID = os.environ.get('AWS_ACCESS_KEY_ID', None)
AWS_SECRET_ACCESS_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY', None)
AWS_S3_SECURE_URLS = False

# Enable S3 deployment only if we have the AWS keys
#
S3_DEPLOYMENT = AWS_ACCESS_KEY_ID is not None

if S3_DEPLOYMENT:
    MEDIA_ROOT = '/media/'
    STATIC_ROOT = '/static/'
    AWS_STORAGE_BUCKET_NAME = 'app'

    DEFAULT_FILE_STORAGE = 'app.s3utils.MediaRootS3BotoStorage'
    STATICFILES_STORAGE = 'app.s3utils.StaticRootS3BotoStorage'
    THUMBNAIL_DEFAULT_STORAGE = 'app.s3utils.MediaRootS3BotoStorage'

    STATIC_URL = "http://static-bucket.s3-website-us-east-1.amazonaws.com/"
    MEDIA_URL = "http://media-bucket.s3-website-us-east-1.amazonaws.com/"
else:
    # Log that S3 is disabled
    pass

HTH,