| HN Mirror

> I think I’m the odd one who actually like working with legacy code.

I had a fun bit of legacy code wrangling a month or so back. I looked at production profiles of some very compute-heavy jobs (we have great tools for that) and found we were spending ~half of my pay on cloud compute in this one library function -- deep equality checks for a kind of object.

The equality-checking code itself is very slow, significantly slower than hand-written alternatives, but the objects are complicated (though luckily tree-like in structure) so hand-writing equality comparison functions would be a nightmare.

Many of these structures had "id" fields, though. Gasp, could we just compare ids instead of doing deep equality checks?

So I asked around, and nobody knew. Nobody even knew what the code was for. It'd be trivial to replace the deep equality checks with id comparisons, but we can't do it.

Seeing how some legacy code works is one thing. Data comes in here, data goes out there. Knowing which invariants the code is trying to maintain, though, what properties it assumes about the data here or there (Sorted? Unique? Up to date? All belongs to the same user?) makes making changes difficult, though. It means you need to write defensive code that the original authors would find ridiculous, because you don't have their institutional knowledge, and can't derive it locally in the code, and can't inspect it at runtime (or be sure that prod won't send a counter-example the day after you deploy to production.)