| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hinkley 334 days ago

Global shared state. High memory overhead. Evictions mean that the beginning of your call can see potentially different values than the end, and most of us write code like this as if it is transactional. We cannot handle a user’s account being closed or upgraded to an admin in the middle of a transaction. So using a cache to keep refetching the user data instead of passing it with the call graph is fraught (and makes unit testing terrible).

As a concrete example, my coworker was trying to walk a DAG with many leaf nodes that were repeated (think dependency tree). He used a cache to do this. It had bugs (2 I knew, but 6 I found). It was one of three large calculations we made at startup, and it should have been the least of these. But it ate memory like candy and added a lot to our start time which was slowing down deployments.

So I replaced it with a DP solution that just looked like normal code, reducing startup memory by ~80 MB and start time by half. By walking the tree and making a list of all the nodes in dependency order and then processing them once, after the list was compiled. And for the microservices and dev tools where we ran this same code the startup time became negligible.

Part of the problem is that the library we used did not like being called recursively, so you had to dance around reentrancy bugs and that cost time and space. But a linearized run has no recursion.

Edit to add: I realize now that I’ve left the conclusion ambiguous.

His two main sins were one, that the cache lived beyond the useful lifetime, and two, that even with the cache some amount of post-processing was required. By unrolling the tree and running each calculation once I reduced the post processing by about 3/4 and climbing as the tree grew in width and height.