Hacker News new | ask | show | jobs
by Someone 2373 days ago
First things to check:

- can you build it and if so, does the produced binary work? If so, look at the makefile (or equivalent) to hunt for compilation switches. If not, spend some time trying to make it build. If you don’t succeed, tell your manager that this will be a lot harder (if the code doesn’t build and run, you don’t even know whether you have all the code, IDEs may have trouble analysing it, etc.)

- And, given that this is a decompressor, do you have access to the compressor, too? Chances are the makefile will give it to you. If yo don’t have one, that isn’t a showstopper, but may make things more difficult, so inform your manager.

- is the code under source control? If so, look at the history. Going back to older releases may give you an easier code base to work with (given that, elsewhere, you say “Its a decompression algorithm with a thin CLI around it”, that may help a lot, getting rid of various optimisations and config options)

You can use various tools to visualise the call graph, but this being a decompressor, there likely are many low-level functions you can’t tell about what they do. If you aren’t familiar with compression algorithms, or with this algorithm in particular, try googling the names of various functions or field or variable names.

In the end, 30k lines of C isn’t _that_ much. It may just be a matter of grinding through. If you browse 1,000 lines an hour (3½ seconds per line), that’s only 30 hours, doable in a week (and a week is not much, if you inherited the code base, and aren’t just visiting it). Just dive in, and by the time you’ve spent 10 hours, you probably have generated some questions that you want answered, discovered some #define’s that control compilation, etc. Eventually, you will have to read every line, but don’t feel obliged to, initially; just follow your instincts (and, in case the business side has some short-time priorities, let that guide you)

On the one hand, decompression algorithms typically are of above-average complexity, making that harder, but on the other hand, it is highly likely that there are various CPU-specific and/or OS-specific code paths that you (initially) can ignore, significantly decreasing your line count.