| A large codebase under active development presents a moving target; Even if you knew how something worked last week, that code might have changed twice since then. Detailed knowledge in the solution domain gets outdated fast. To address this issues, I work with something I call a behavioral code analysis. In a behavioral code analysis, you prioritize the code based on its relative importance and the likelihood that you will have to work with it and, hence, needs to understand that part. Behavioral code analysis is based on data from how the organization works with the code, and I use version-control data (e.g. Git) as the primary data source. More specifically, I look to identify hotspots. A hotspot is complicated code that the organization has to work with often. So it's a combination of static properties of the code (complexity, dependencies, abstraction levels, etc) and -- more important -- a temporal dimension like change frequency (how often do you need to modify the code?) and evolutionary trends. I have found that identifying and visualizing hotspots speeds up my on-boarding time significantly as I can focus my learning on the parts of the code that are likely to be central to the solution. In addition, a hotspot visualization provides a mental map that makes it easier to mentally fit the codebase into our head. There are a set of public examples and showcases based on the CodeScene tool here: https://codescene.io/showcase I have an article that explains hotspots and behavioral code analysis in more depth here: https://empear.com/blog/prioritize-technical-debt/ I also have a book, Software Design X-Rays: Fix Technical Debt with Behavioral Code Analysis, that goes into more details and use cases that you might find useful for working with large codebases: https://pragprog.com/book/atevol/software-design-x-rays |
Got any tips for possible problems I'll encounter along the way?