Hacker News new | ask | show | jobs
by facundo_dbx 67 days ago
Author here :) We did have high-level metrics and expectations for how this change would behave, but a couple of factors made it much harder to reason about in practice that were happening in parallel.

Data in these systems moves slowly and with a lot of inertia, so the effects show up gradually and can lag behind the change itself. On top of that, the impact wasn’t uniform. Most of the overhead came from a small subset of volumes, so it took time to isolate what was actually driving the increase. These systems are hard to test at scale!