Hacker News new | ask | show | jobs
by mteigers 1124 days ago
Ditto, my managed service at Google threw close to 3k 500s / second when I was still there. Anything from cosmic rays, faulty hardware bit flipping, hard drive failures.

We did, however, aggregate and group similar 500s and those did get looked at, but no way could we have looked at all errors.

The other thing, is that with resilient infrastructure, who cares about an occasional 500. Back off and retry. No harm done.

3 comments

User experience might indeed not be influenced by these errors, but errors of a less stochastic nature will impact it. The former obscure visibility of the latter, and that's probably the point of TA.
It's likely some fraction of those errors would have had some tangible impact on user experience.
3k? What percentage of requests is that?