Hacker News new | ask | show | jobs
by cperciva 4057 days ago
I would have sent out an email to the mailing lists earlier; but since at each point I thought I was "one change away" from fixing the problems, I kept on delaying said email until it was clear that the problems were finally fixed

This ties in to the last lesson I mentioned at the bottom:

5. When performance drops, it's not always due to a single problem; sometimes there are multiple interacting bottlenecks.

Every time I identified a problem, I was correct that it was a problem -- my failing was in not realizing that there were several things going on at once.

3 comments

> Every time I identified a problem, I was correct that it was a problem -- my failing was in not realizing that there were several things going on at once.

Very common! One thing that's been helpful for us is establishing predefined system performance thresholds that, if exceeded, initiate the chain of events that will lead to customer communication. "If X% of requests are failing, then we had better advertise that the system is degraded." Discussing and setting these thresholds in advance and the expectation that they'll result in communication helps drive the right outcome. It's not perfect, because one is always tempted to make a judgment call in the circumstance, which is vulnerable to the same effect, but it's a good start.

Thanks for sharing!

i tend to get to debug problems like this (usually in 3rd party code i dont know the internals of) pretty frequently.. my experience has been it tends to follow a curve..MOST of the time, the problem is simple and you can quickly dispatch it. the scary (or fun, depending on your perspective) part hits when you pass the first level, and there are still problems.. and you dont know if it's two or ten levels deeper. then you get into that crazy test/optimize cycle and crawl out two weeks later wondering when you last ate..
That totally jibes with what I found "reassuring" in a sense. That even very smart people sometimes get hit with inadvertent "multiple problems looking like a single issue" situations.