Hacker News new | ask | show | jobs
by lrem 885 days ago
Does anyone serious do this?

That’s an honest question, from a pretty experienced SRE.

2 comments

In a world of unicorns and rainbows, absolutely. In the real world, it's as you probably already know: it's not that easy in a complex enough system.

Quick counter-example for GP: what if the 500 spike is due to a spike in malformed requests from a single (maybe malicious) user?

A malformed request should not lead to a 500, they should be handled and validated.
Well, in the real world it might. It should trigger a bug creation and a fix to the code, but not an incident. Now all of a sudden to decide this you need more complex and/or specific queries in your monitoring system (or a good ML-based alert system), so complexity is already going up.
Query input validation is nearly a solved problem. If you don't I would argue this is an incident if in this case 500's are returned.
You need to validate your inputs and return 4xx
Yeah and you also shall not write bugs in your code. Real world has bugs, even trivial ones.
If your service is returning 5xx, that is the the definition of a server error, of course that is degraded service. Instead we have pointless dashboards that are green an hour after everything is broken.

Returning 4xx on a client error isn't hard and is usually handled largely by your framework of choice.

Your argument is a strawman

True, however it also doesn’t impact other users and doesn’t justify reporting an incident on the status page.