Django ORM comes with a great support for storing files that relate to database
fields but are stored elsewhere (local filesystem, cloud storage, you name it). The workforce of this functionality
FileField and related code from Django's file access API.
This is also the basis for
ImageField, which you probably use for things such as user avatars or other user-uploaded
images, even if your use case doesn't involve managing other generic uploaded files.
A common UI pattern these days is to allow the user to either upload the file, import from a connected cloud storage account (Google Drive, Dropbox and so on), or paste a link to the file located anywhere on the Internet. The user might also be signing up via a social provider, with your application getting the URL to the user's profile image in the account data for the newly created user.
Importing files from cloud storage will use their API directly and differs slightly from service to service. In this article, we'll only focus on the last case: when the user (or a social auth provider) supplies an URL to the file and we want to import it.
What we want to do is download the file from the URL on a temporary location on the server, save it to the File field (which will potentially have the effect of uploading it somewhere else, for example S3, if youse such a service for file storage), and remove any temporary files left.
This is common enough to warrant extracting into own function, yet simple enough that it doesn't really need to be packaged and used as a separate standalone package.
(Note: examples here assume Python 3. All the same functionality is available in Python 2, just the import paths for some functions are different).
The quick and simple way
The simplest way is to use
urlretrieve function from the Python standard library. A naive approach could work
from django.core.files import File from urllib.request import urlretrieve def download_to_file_field(url, field): tempname, _ = urlretrieve(url) field.save(tempname, File(open(tempname, 'rb')))
While this mostly works, it has a few problems. The bigger problem is that it doesn't clean up after itself, so temporary files will keep piling on your server. The small problem, probably just an annoyance, is that the name for the saved file will be basically random, as it's based from the temporary file name, which has no relation to the actual name (URL) of the file being downloaded.
Let's improve it a bit:
from django.core.files import File from os.path import basename from urllib.request import urlretrieve, urlcleanup from urllib.parse import urlsplit def download_to_file_field(url, field): try: tempname, _ = urlretrieve(url) field.save(basename(urlsplit(url).path), File(open(tempname, 'rb'))) finally: urlcleanup()
This has two additions. First is that it always calls
urlcleanup, which, as the name suggests, cleans up after
urlretrieve. That is, it deletes any temporary files that might have been created. We wrap it in a
so it is always excecuted, no matter whether the download succeeded or failed.
The second addition is that use the filename from the URL instead of the temporary name. If the URL ends up with
a normal-looking-name (for example, if the URL was
https://example.com/path/to/images/profile.jpg), our file will
be named accordingly (in this example,
profile.jpg). The Django file API may change the name further (for example
if a file with the same name already exists, or if your
uploaded_to function overrides the name), but it's still
an improvement to the original, totally arbitrary, behaviour.
Requests for all the things!
urlretrieve function in the standard library is great if you only need to download a static file and if you
don't need to do anything funny with the request. If you need to craft a special request (for example set headers,
cookies, query params and so on), you may want to implement the same thing using the awesome
Here's how that might look:
from django.core.files import File from os.path import basename import requests from tempfile import TemporaryFile from urllib.parse import urlsplit def download_to_file_field(url, field): with TemporaryFile() as tf: r = requests.get(url, stream=True) for chunk in r.iter_content(chunk_size=4096): tf.write(chunk) tf.seek(0) field.save(basename(urlsplit(url).path), File(tf))
Here, we're creating a temporary file using
TemporaryFile from the standard library. We're using it
as a context processor, so it'll automatically clean up (delete temporary file) after the block of the code is exited.
We then make a GET request with
requests in streaming mode, so it doesn't try to download the entire file in memory.
Instead it gives us chunks, which we save to a temporary file. This approach avoids causing problems if the files
to download are big relative to available memory (you never know when your user might attempt to set a two-hour
full-HD movie as their avatar picture). The
urlretrieve function also works in the same way.
After we've downloaded the content, we seek to the beginning of opened file (so we can read from the start), and tell the FileField to save it. We use the same logic for naming the file as before.
In this case, we haven't used any extra features the requests library gives us to avoid complicating the example. In
the real world you can do anything the requests library supports here (and if you don't need it, consider using