Customizing filenames without patching Django

by Marty Alchin on November 7, 2007 about Django

Recently, I’ve noticed a good bit of chatter on the mailing lists about how to override the filename of an uploaded file before saving it. Django currently provides options for customizing the location based on upload time, but nothing else. Often, projects find the need to store attachments in directories according to details of the object or a related object. I’m doing a good bit of work on FileField at the moment, but that’s still a way off and doesn’t yet address this issue anyway.

With that said, I’ll be perfectly clear. The information below is based on details of how FileField currently works. Once the filestorage patch makes it into trunk, parts of this article will probably work for a while, but I make no guarantees about how long. If you follow trunk and update often, you’ve been warned.

Fair warning: Much of this article relies on implementation details internal to Django. I haven’t tested out every last facet of FileFields and how the techniques below might affect their behavior. It should work fine for the purposes I describe, but I make no guarantees of any kind.

Currently, the only technique you can reliably use is to override _save_FIELD_file on your model. This isn’t quite the same as save_FOO_file that’s officially documented. Instead, the name of the method I’m referring is actually _save_FIELD_file. Note the leading underscore and the fact that FIELD should not be replaced with the name of the appropriate field. More on that detail later. The documented save_FOO_file method actually gets assigned during creation of the model, so even if you define it, it’ll just get overwritten. Every time. Without your consent.

The signature for _save_FIELD_file looks like this: def _save_FIELD_file(self, field, filename, raw_contents, save=True). The arguments it takes are as follows:

For the following sections, we’ll start with models that look like this, and add to them as we go along:

from django.db import models
from django.contrib.auth.models import User

class Ticket(models.Model):
    author = models.ForeignKey(User)
    title = models.CharField(maxlength=255)
    description = models.TextField()

class Patch(models.Model):
    ticket = models.ForeignKey(Ticket)
    author = models.ForeignKey(User)
    code = models.FileField(upload_to='patches')

Obviously the models above will run into a naming conflict if two patches ever have the same filename. Django automatically works around this by adding underscores to the end of the filename until it finds an available name. However, if you have any other code (or your own eyes) that needs to look for these files, it’d be impossible to tell which patch goes with which attachment.

Altering the filename

Thankfully, the filename is the easiest thing to change, and it can often be done with just a few lines of new code on your model. By adding the following method to the Patch model, we’ll insert the id of the ticket into the beginning of the filename, so it can be easily identified outside of Django.

def _save_FIELD_file(self, field, filename, raw_contents, save=True):
    filename = "%s_%s" % (self.ticket_id, filename)
    super(Patch, self)._save_FIELD_file(field, filename, raw_contents, save)

Pretty painless, right? Now, if a user uploads a file called quick_fix.diff for a ticket with an ID of 342, this method will save that file as 342_quick_fix.diff. Now if you have an automated script that applies patches and tests the new code, it can now figure out what ticket number it was attached to and include that number in a success/failure report. Or whatever you want to do with that ID.

Adding a subdirectory

A more common request is to be able to specify a subdirectory based on the current object’s details. This would be similar to the above example, but would be a bit easier to deal with manually, since the files wouldn’t all be in one big directory, but split off into many smaller directories instead. By replacing the method we created above, we’ll create a subdirectory with the username of the user who submitted the patch. It’s a bit more complicated, but I’ll explain.

def _save_FIELD_file(self, field, filename, raw_contents, save=True):
    original_upload_to = field.upload_to
    field.upload_to = '%s/%s' % (field.upload_to, self.user.username)
    super(Patch, self)._save_FIELD_file(field, filename, raw_contents, save)
    field.upload_to = original_upload_to

Now you can easily write a separate script to tell you how many patches a user has submitted, how big they are, how many lines, etc., all without touching the database. Neat!

Internally, Django strips off any directory information filename might have in it, for security, so we have to put the new subdirectory in the field’s upload_to attribute instead. This actually works out well anyway, since Django also includes logic to create any necessary subdirectories if they don’t already exist. Bonus. Just make sure to always include the original upload_to at the beginning of the new value, so the subdirectories get put in the right spot.

One thing to keep in mind is that blindly modifying upload_to would affect all future files saved by that field, which we obviously don’t want. This is supposed to be limited to just the one file. We have to save the original upload_to value before modifying it, so it can be replaced once the file’s in place. As the code shows, this is very easy to do, but just know that it absolutely must be done. This messes around with things that ought not be messed with, so be sure to clean up after yourself. Again, you’ve been warned.

Juggling multiple FileFields

Remember that _save_FIELD_file is declared once per model. This is the case no matter how many FileFields you have declared. Consider adding another field so that documentation patches can be maintained separately from code. Here’s how our model would look after this change, but without the method:

class Patch(models.Model):
    ticket = models.ForeignKey(Ticket)
    author = models.ForeignKey(User)
    code = models.FileField(upload_to='patches')
    docs = models.FileField(upload_to='docs')

You could add in either of the previous methods to this model, and it would work out fine, as long as you don’t mind having, for instance, code stored in patches/gulopine/new_feature.diff and docs stored in docs/gulopine/new_feature.txt. For this, though, I’ll use the second version with the ticket ID instead of username, and with another change. It’s a fairly minor change, but it illustrates an imoprtant point.

def _save_FIELD_file(self, field, filename, raw_contents, save=True):
    original_upload_to = field.upload_to
    if field.name == 'code':
        field.upload_to = '%s/%s' % (
            field.upload_to,
            self.ticket_id)
    if field.name == 'docs':
        code = self._meta.get_field('code')
        field.upload_to = '%s/%s/%s' % (
            code.upload_to,
            self.ticket_id,
            field.upload_to)
    super(Patch, self)._save_FIELD_file(field, filename, raw_contents, save)
    field.upload_to = original_upload_to

As you can see, the method can use the field it was given to determine which type of file it’s dealing with. In this case, it stores all files, regardless of type, within a directory such as patches/342. Code is stored directly in this directory, while documentation is shuffled off into a docs subdirectory, so you might have patches/342/enhancement.diff and patches/342/docs/enhancement.txt.

Revision” history

One other potential use is where you often save new files to existing objects, and you’d like to keep track of those files together. The preceding examples only show when a related object is the same each time; it assumes that each file belongs to a new Patch. But what if you want users to be able to update the patch without creating a new one? The process is essentially the same as before, but with one important detail.

If you’re doing this, you’ll probably want the ID of the Patch to be included in the path or filename. Doing this is a bit tricky, since Django doesn’t save the file until after the file has been saved. This allows it to make one trip to the database, after it’s figured out what filename was actually used. In this case, however, we need the ID before saving the file, so we’ll have to sacrifice that slight optimization. We only need it the first time the object is saved, though, so we can at least keep it to a minimum.

def _save_FIELD_file(self, field, filename, raw_contents, save=True):
    if not self.id:
        self.save() # Make sure it has an ID
    original_upload_to = field.upload_to
    if field.name == 'code':
        field.upload_to = '%s/%s/%s' % (
            field.upload_to,
            self.ticket_id,
            self.id)
    if field.name == 'docs':
        code = self._meta.get_field('code')
        field.upload_to = '%s/%s/%s/%s' % (
            code.upload_to,
            self.ticket_id,
            self.id,
            field.upload_to)
    super(Patch, self)._save_FIELD_file(field, filename, raw_contents, save)
    field.upload_to = original_upload_to

Just a few quick changes and we’re off to the races. Now each ticket’s directory will have a set of subdirectories, one for each patch. Each patch subdirectory will have a set of files that have been updated under that patch, and documentation as well. This is a less likely scenario, but still possible.

Conclusion

So, even though all of this is likely to change in the not-so-distant future, I think that pretty much sums up how to customize file storage in current releases of Django. If you find yourself doing a lot of this, though, be sure to keep an eye on the filestorage patch, as it aims to make all of this a whole lot nicer.