Django pattern: download to FileField

2018-03-09 by Senko Rašić

Django ORM comes with a great support for storing files that relate to database fields but are stored elsewhere (local filesystem, cloud storage, you name it). The workforce of this functionality is the FileField and related code from Django's file access API. This is also the basis for ImageField, which you probably use for things such as user avatars or other user-uploaded images, even if your use case doesn't involve managing other generic uploaded files.

A common UI pattern these days is to allow the user to either upload the file, import from a connected cloud storage account (Google Drive, Dropbox and so on), or paste a link to the file located anywhere on the Internet. The user might also be signing up via a social provider, with your application getting the URL to the user's profile image in the account data for the newly created user.

Importing files from cloud storage will use their API directly and differs slightly from service to service. In this article, we'll only focus on the last case: when the user (or a social auth provider) supplies an URL to the file and we want to import it.

What we want to do is download the file from the URL on a temporary location on the server, save it to the File field (which will potentially have the effect of uploading it somewhere else, for example S3, if youse such a service for file storage), and remove any temporary files left.

This is common enough to warrant extracting into own function, yet simple enough that it doesn't really need to be packaged and used as a separate standalone package.

(Note: examples here assume Python 3. All the same functionality is available in Python 2, just the import paths for some functions are different).

The quick and simple way

The simplest way is to use urlretrieve function from the Python standard library. A naive approach could work like this:

from django.core.files import File
from urllib.request import urlretrieve


def download_to_file_field(url, field):
    tempname, _ = urlretrieve(url)
    field.save(tempname, File(open(tempname, 'rb')))

While this mostly works, it has a few problems. The bigger problem is that it doesn't clean up after itself, so temporary files will keep piling on your server. The small problem, probably just an annoyance, is that the name for the saved file will be basically random, as it's based from the temporary file name, which has no relation to the actual name (URL) of the file being downloaded.

Let's improve it a bit:

from django.core.files import File
from os.path import basename
from urllib.request import urlretrieve, urlcleanup
from urllib.parse import urlsplit


def download_to_file_field(url, field):
    try:
        tempname, _ = urlretrieve(url)
        field.save(basename(urlsplit(url).path), File(open(tempname, 'rb')))
    finally:
        urlcleanup()

This has two additions. First is that it always calls urlcleanup, which, as the name suggests, cleans up after urlretrieve. That is, it deletes any temporary files that might have been created. We wrap it in a finally clause so it is always excecuted, no matter whether the download succeeded or failed.

The second addition is that use the filename from the URL instead of the temporary name. If the URL ends up with a normal-looking-name (for example, if the URL was https://example.com/path/to/images/profile.jpg), our file will be named accordingly (in this example, profile.jpg). The Django file API may change the name further (for example if a file with the same name already exists, or if your uploaded_to function overrides the name), but it's still an improvement to the original, totally arbitrary, behaviour.

Requests for all the things!

The urlretrieve function in the standard library is great if you only need to download a static file and if you don't need to do anything funny with the request. If you need to craft a special request (for example set headers, cookies, query params and so on), you may want to implement the same thing using the awesome requests library.

Here's how that might look:

from django.core.files import File
from os.path import basename
import requests
from tempfile import TemporaryFile
from urllib.parse import urlsplit


def download_to_file_field(url, field):
    with TemporaryFile() as tf:
        r = requests.get(url, stream=True)
        for chunk in r.iter_content(chunk_size=4096):
            tf.write(chunk)

        tf.seek(0)
        field.save(basename(urlsplit(url).path), File(tf))

Here, we're creating a temporary file using TemporaryFile from the standard library. We're using it as a context processor, so it'll automatically clean up (delete temporary file) after the block of the code is exited. We then make a GET request with requests in streaming mode, so it doesn't try to download the entire file in memory. Instead it gives us chunks, which we save to a temporary file. This approach avoids causing problems if the files to download are big relative to available memory (you never know when your user might attempt to set a two-hour full-HD movie as their avatar picture). The urlretrieve function also works in the same way.

After we've downloaded the content, we seek to the beginning of opened file (so we can read from the start), and tell the FileField to save it. We use the same logic for naming the file as before.

In this case, we haven't used any extra features the requests library gives us to avoid complicating the example. In the real world you can do anything the requests library supports here (and if you don't need it, consider using the simpler urlretrieve approach).

Author
Senko Rašić
We’re small, experienced and passionate team of web developers, doing custom app development and web consulting.