Hacker News new | ask | show | jobs
by cyphar 3483 days ago
It looks like the exponential model isn't a good fit at all -- in all cases it undershoots the decay at the start of the graph and overshoots at the tail end. So while it might "look close" there is some systematic that your model doesn't account for. In particular, I don't agree that all code in a codebase has a constant risk of being replaced -- most projects have different components that are developed at different rates. Some components are legacy code that is likely to never change, while other parts are under rapid development. In fact, I'd argue that's why the tail is so long -- legacy code is called "legacy" for a reason. And the tip of the graph dives down so quickly because code being rapidly developed has a higher chance of being replaced.
3 comments

Agreed; you can reject exponential decay a priori: a code base has some minimal set of functionality that it specifies and there is some minimal set of lines required to provide that functionality. If you believe this, then the "decay" curves, must illustrate an asymptote. The asymptote only gets reduced by breaking backwards compatibility. Such an action would include projects like Angular that go ahead and throw away a ton of core functionality in moving from 1.x to 2.x.

The decay isn't actually decay at all, but represents the complement of lines that define peripheral functionality. Lines defining peripheral functionality typically require modification (refactoring) as additional functionality is added. The asymptote which all the curves illustrate but the fit cannot capture represents the proportion of irreducible core functionality.

Simply adding a constant to the fit might fix all the problems.

This is a close fit to Fechner's Law (not Weber's) relating perceived intensity to stimulus in animal vision. Also, thanks so much, author.
It might be useful to fit an exponential for each years code. The oldest code seems to decay to some constant and then linger forever at some LOC - it does not decay to zero LOC. It may be that new code is inherently inefficient and over time is gets cleaned up. Also some functionality may be abstracted and built into new code and removed from the old. Over time you end up with a chunk of code doing its one thing and doing it well. This is all speculation of course. I hope this gets looked at a lot more.
Agreed... looking at the graphs, the fit screams "non-exponential" to me.