I’ve had a project rattling around in my head for a few years now. Take Django’s declarative approach to models and forms, and apply it to the definition of binary file formats. I kow I’m not the only one to have thought of it, but I think I’m the first to take it seriously as a project. So far, it’s had many names and taken many forms, but I think I’ve finally found an approach that’ll help me actually get the thing done: usage driven design.
I doubt I’m the first to come up with this, and there’s probably an oddly-named Wikipedia page somewhere describing in excruciating detail how to use it in a business setting or something. For me, it just boils down to using the project I want to make, before I’ve even made it.
There are several file formats in a particular domain that I really want to support with this framework, so what I had done in the past was write the code to support those frameworks, test them rigorously and congratulate myself on a job well done, before moving on to implement more formats and add to the framework as necessary to support them. I’ve done this at least 4 times now, and though I got better at understanding and anticipating the problems, I’d always find myself fighting with assumptions I made too early in the process. It tends to go a little something like this:
Ooh, a chunked format! I can do those! Wait, they put the size of the chunk before the type indicator? That means I can’t reuse the code I put together for IFF files, so I guess I’ll now have to define what it means to be a chunk, so I can rearrange it if necessary…
And there’s a CRC value after the payload? I can easily reference one field within another, but this CRC includes the chunk type indicator as well, so I’ll need a way to specify the start and end values, as well as a way to get the raw data back out of the file again, so the CRC value can be verified…
And ugh, the size includes the indicator value and the CRC value, instead of just the payload? Now I’ll need to be able to specify the payload size as an expression, so I can subract 8 from whatever value was read from the file, before reading that data in…
And all that came up by just trying to implement a few common image formats after designing the framework to work with some less common music formats. I kept having to do so much redesigning that I’d end up throwing the whole thing away and starting over! Needless to say, I was getting nowhere fast, and I needed a new approach.
What I’m doing instead is gathering a laundry list of formats I want to implement (I’ve got about a hundred in mind so far), and rather than trying to wedge in support for each of their edge cases after building the framework, I’m going to implement them first, before writing a single line framework-level code.
Basically, I’m just designing the framework, rather than trying to implement it at the same time. This way, if I need to make a change to the design, I’m not held back by implementation details based on assumptions I should never have made. I can’t hope to plan for every single use case, but with so many formats to work through, I’m likely to catch most of the oddities and design a framework that’s flexible enough to accommodate most of the rest after the fact.
As much as I love programming, I’m finding this new process to be very fun. My only real restriction is to be consistent across different formats. If I used something called
PositiveInteger there, I should use it here as well. Make sure the arguments and other semantics all match up, and I’m on my way. I still have to go back through existing formats and make adjustments if I need to modify the semantics of something existing, but I’d have to do that anyway, but this way I’m only adjusting the formats, rather than the framework as well.
I should point out that when I spoke to a friend about this recently over sushi, he referred to it as Readme Driven Development, but I’m not so sure it applies so cleanly. It does seem to provide many of the same benefits he lists, but I’m not really documenting anything. Someone reading over these format implementations probably wouldn’t know how to actually use the framework yet, because I’m only writing code to use it, not words. Plus, I’m not limiting myself to a single file, because there are just too many cases for that. I’m probably using the same approach as it applies to my problem domain, so I don’t mind the comparison anyway.
Mostly, though, I don’t like to think of this step as development, or even driving development. Instead, this is all about design. Development does include design, but it also includes all the more mundane, real-world concerns like performance and security. I’m not worrying about those at the moment, because the design is so important to this project. When I do worry about that stuff later, it might require some more changes, but those wouldn’t be specific to any format, so they should be easier to make as a whole. Besides, I’ll have hundreds of fully automated tests based on a few example files for each of the formats I’ve implemented by then, to make sure I don’t accidentally break anything.
So, I guess it’s usage driven design, which later becomes test driven development. I’m not much for trying to define stuff like this, though. I’m just trying to do what I think will help me get this thing working. It’s still a little too early to say how well it’ll work overall, but my experience so far has been quite positive. The flexibility has been incredibly useful, and I’m finally able to incorporate features that would’ve previously taken me ages.
If you’re interested in my progress so far, feel free to look over the examples at Github. I plan to introduce a few new formats every week, depending on how much time I have to hack on them, and how complicated they are. I’m willing to take suggestions on future formats, but I’d like to at least organize and post my list first, so I don’t get a hundred requests for things I’m already planning.