|
|
|
|
|
by cperciva
4057 days ago
|
|
I would have sent out an email to the mailing lists earlier; but since at each point I thought I was "one change away" from fixing the problems, I kept on delaying said email until it was clear that the problems were finally fixed This ties in to the last lesson I mentioned at the bottom: 5. When performance drops, it's not always due to a single problem; sometimes there are multiple interacting bottlenecks. Every time I identified a problem, I was correct that it was a problem -- my failing was in not realizing that there were several things going on at once. |
|
Very common! One thing that's been helpful for us is establishing predefined system performance thresholds that, if exceeded, initiate the chain of events that will lead to customer communication. "If X% of requests are failing, then we had better advertise that the system is degraded." Discussing and setting these thresholds in advance and the expectation that they'll result in communication helps drive the right outcome. It's not perfect, because one is always tempted to make a judgment call in the circumstance, which is vulnerable to the same effect, but it's a good start.
Thanks for sharing!