| 1. Start at the simplest subsystem, and work from there. This is especially true when working with third-party, legacy, or undocumented code. In the absence of documentation, your only way forward is to read the source code[0]. Find the main() of the simplest/smallest module in the system, and begin tracing through the code. This could be with a debugger, print statements, or the search functionality in your editor/IDE (Find Usages and Go to Declaration in Intellij are lifesavers). 2. Don't be afraid to break things apart. If the simplest module in the system is overwhelmingly complex, start commenting out parts of the code. Go until you have effectively reduced it to "Hello, World", if you have to. From there, you can gradually add features back. 3. Constantly test your assumptions. Don't assume comments do what they say they do. Don't assume that config flag produces the behavior the documentation claims it does. Do take inventory of you assumptions whenever the behavior of the system contradicts your current understanding of how it works. You should be able to back up any claims about the system with empirical evidence e.g. when I change A to B, X happens; if I change A to C, Y happens. [0] http://blog.codinghorror.com/learn-to-read-the-source-luke/ |
In the process of learning a distributed system, the process of looking at the code as its documentation is not the ideal way. One could spend enormous amount of time to just understand a simple business usecase.
But then just because it takes time doesn't make it untrue.
I would rather accept a system which self documents itself with every change.
The point of breaking things is all well and good but to truly understand everything, after breaking also one needs to put the together one by one and understand their interactions so that if any one in the pool of modules fails, one knows what exactly would have caused it and how to be more resilient towards failures.