|
|
|
|
|
by SuperNinKenDo
901 days ago
|
|
This is something I've been interested in for a while. I've collected a few links people have already posted to their own projects or write-ups here and elsewhere, but is there any single excellent resource for learning how to do this? I've a number of dead and/or proprietary formats that I've always wanted to crack open, but I'm totally overwhelmed with where to start. |
|
First, make sure that you know what the format is actually supposed to encode. For example, if some file weighs (say) 40 KB then it is unlikely to be a raster image. The file name, if any, helps a lot to narrow the scope.
Second, you should have some understanding of similar file formats. I generally recommend to study PNG first because it gives an example of typical structured file formats and raster image formats. (Don't delve into the compression though---bitwise analysis is much harder.) This is also why you needed to know what the format is for, many formats with the same goal tend to have similar structures.
Third, collect as many examples as possible. You can line them up to see commonalities and differences and spot patterns. Even better if you can actively generate different files. This is generally the last hope when you are run out of reasonable hypotheses.
Fourth, optimize the feedback loop. You will have to do a lot of hypothesization, validation and automation. You can't really optimize the number of iterations, but you can optimize the time for a single iteration. Use a comfortable scripting language with good binary operation. I tend to use a vanilla Python with struct and make everything else by my own, but there are several libraries that greatly help you if you don't feel like doing so.