Recently, I’ve noticed a good bit of chatter on the mailing lists about how to override the filename of an uploaded file before saving it. Django currently provides options for customizing the location based on upload time, but nothing else. Often, projects find the need to store attachments in directories according to details of the object or a related object. I’m doing a good bit of work on FileField at the moment, but that’s still a way off and doesn’t yet address this issue anyway.
With that said, I’ll be perfectly clear. The information below is based on details of how FileField currently works. Once the filestorage patch makes it into trunk, parts of this article will probably work for a while, but I make no guarantees about how long. If you follow trunk and update often, you’ve been warned.
Fair warning: Much of this article relies on implementation details internal to Django. I haven’t tested out every last facet of FileFields and how the techniques below might affect their behavior. It should work fine for the purposes I describe, but I make no guarantees of any kind.
Currently, the only technique you can reliably use is to override
_save_FIELD_file on your model. This isn’t quite the same as save_FOO_file that’s officially documented. Instead, the name of the method I’m referring is actually
_save_FIELD_file. Note the leading underscore and the fact that
FIELD should not be replaced with the name of the appropriate field. More on that detail later. The documented
save_FOO_file method actually gets assigned during creation of the model, so even if you define it, it’ll just get overwritten. Every time. Without your consent.
The signature for
_save_FIELD_file looks like this:
def _save_FIELD_file(self, field, filename, raw_contents, save=True). The arguments it takes are as follows:
self: Since we’re working with a method on a model, this will be the current model instance. This is where you’ll get attributes of the object.
field: The actual FileField object for the attribute being referenced. If you call
save_code_file, this would be the FileField you assigned to the name
filename: The original filename provided to
save_FOO_file. Unless you’re doing something special, this is the same name provided by the web browser when the file was uploaded.
raw_contents: As the name implies, this is the guts of the file, everything that will get saved on the server.
save: A value of
True(the default) means the model instance should be saved after saving the file, while
Falsewill leave the instance unsaved. This exists to prevent having to hit the database too many times if there’s more to be done on the object after saving the file.
For the following sections, we’ll start with models that look like this, and add to them as we go along:
from django.db import models from django.contrib.auth.models import User class Ticket(models.Model): author = models.ForeignKey(User) title = models.CharField(maxlength=255) description = models.TextField() class Patch(models.Model): ticket = models.ForeignKey(Ticket) author = models.ForeignKey(User) code = models.FileField(upload_to='patches')
Obviously the models above will run into a naming conflict if two patches ever have the same filename. Django automatically works around this by adding underscores to the end of the filename until it finds an available name. However, if you have any other code (or your own eyes) that needs to look for these files, it’d be impossible to tell which patch goes with which attachment.
Thankfully, the filename is the easiest thing to change, and it can often be done with just a few lines of new code on your model. By adding the following method to the
Patch model, we’ll insert the
id of the ticket into the beginning of the filename, so it can be easily identified outside of Django.
def _save_FIELD_file(self, field, filename, raw_contents, save=True): filename = "%s_%s" % (self.ticket_id, filename) super(Patch, self)._save_FIELD_file(field, filename, raw_contents, save)
Pretty painless, right? Now, if a user uploads a file called
quick_fix.diff for a ticket with an ID of 342, this method will save that file as
342_quick_fix.diff. Now if you have an automated script that applies patches and tests the new code, it can now figure out what ticket number it was attached to and include that number in a success/failure report. Or whatever you want to do with that ID.
A more common request is to be able to specify a subdirectory based on the current object’s details. This would be similar to the above example, but would be a bit easier to deal with manually, since the files wouldn’t all be in one big directory, but split off into many smaller directories instead. By replacing the method we created above, we’ll create a subdirectory with the username of the user who submitted the patch. It’s a bit more complicated, but I’ll explain.
def _save_FIELD_file(self, field, filename, raw_contents, save=True): original_upload_to = field.upload_to field.upload_to = '%s/%s' % (field.upload_to, self.user.username) super(Patch, self)._save_FIELD_file(field, filename, raw_contents, save) field.upload_to = original_upload_to
Now you can easily write a separate script to tell you how many patches a user has submitted, how big they are, how many lines, etc., all without touching the database. Neat!
Internally, Django strips off any directory information
filename might have in it, for security, so we have to put the new subdirectory in the field’s
upload_to attribute instead. This actually works out well anyway, since Django also includes logic to create any necessary subdirectories if they don’t already exist. Bonus. Just make sure to always include the original
upload_to at the beginning of the new value, so the subdirectories get put in the right spot.
One thing to keep in mind is that blindly modifying
upload_to would affect all future files saved by that field, which we obviously don’t want. This is supposed to be limited to just the one file. We have to save the original
upload_to value before modifying it, so it can be replaced once the file’s in place. As the code shows, this is very easy to do, but just know that it absolutely must be done. This messes around with things that ought not be messed with, so be sure to clean up after yourself. Again, you’ve been warned.
_save_FIELD_file is declared once per model. This is the case no matter how many FileFields you have declared. Consider adding another field so that documentation patches can be maintained separately from code. Here’s how our model would look after this change, but without the method:
class Patch(models.Model): ticket = models.ForeignKey(Ticket) author = models.ForeignKey(User) code = models.FileField(upload_to='patches') docs = models.FileField(upload_to='docs')
You could add in either of the previous methods to this model, and it would work out fine, as long as you don’t mind having, for instance, code stored in
patches/gulopine/new_feature.diff and docs stored in
docs/gulopine/new_feature.txt. For this, though, I’ll use the second version with the ticket ID instead of username, and with another change. It’s a fairly minor change, but it illustrates an imoprtant point.
def _save_FIELD_file(self, field, filename, raw_contents, save=True): original_upload_to = field.upload_to if field.name == 'code': field.upload_to = '%s/%s' % ( field.upload_to, self.ticket_id) if field.name == 'docs': code = self._meta.get_field('code') field.upload_to = '%s/%s/%s' % ( code.upload_to, self.ticket_id, field.upload_to) super(Patch, self)._save_FIELD_file(field, filename, raw_contents, save) field.upload_to = original_upload_to
As you can see, the method can use the
field it was given to determine which type of file it’s dealing with. In this case, it stores all files, regardless of type, within a directory such as
patches/342. Code is stored directly in this directory, while documentation is shuffled off into a
docs subdirectory, so you might have
One other potential use is where you often save new files to existing objects, and you’d like to keep track of those files together. The preceding examples only show when a related object is the same each time; it assumes that each file belongs to a new
Patch. But what if you want users to be able to update the patch without creating a new one? The process is essentially the same as before, but with one important detail.
If you’re doing this, you’ll probably want the ID of the
Patch to be included in the path or filename. Doing this is a bit tricky, since Django doesn’t save the file until after the file has been saved. This allows it to make one trip to the database, after it’s figured out what filename was actually used. In this case, however, we need the ID before saving the file, so we’ll have to sacrifice that slight optimization. We only need it the first time the object is saved, though, so we can at least keep it to a minimum.
def _save_FIELD_file(self, field, filename, raw_contents, save=True): if not self.id: self.save() # Make sure it has an ID original_upload_to = field.upload_to if field.name == 'code': field.upload_to = '%s/%s/%s' % ( field.upload_to, self.ticket_id, self.id) if field.name == 'docs': code = self._meta.get_field('code') field.upload_to = '%s/%s/%s/%s' % ( code.upload_to, self.ticket_id, self.id, field.upload_to) super(Patch, self)._save_FIELD_file(field, filename, raw_contents, save) field.upload_to = original_upload_to
Just a few quick changes and we’re off to the races. Now each ticket’s directory will have a set of subdirectories, one for each patch. Each patch subdirectory will have a set of files that have been updated under that patch, and documentation as well. This is a less likely scenario, but still possible.
So, even though all of this is likely to change in the not-so-distant future, I think that pretty much sums up how to customize file storage in current releases of Django. If you find yourself doing a lot of this, though, be sure to keep an eye on the filestorage patch, as it aims to make all of this a whole lot nicer.