|
|
|
|
|
by Spivak
100 days ago
|
|
> On this note, a login failure is not an error Login failure is like the most important error you'll track. A login failure isn't necessarily actionable but a spike of thousands of them for sure is. No single system has been more responsible for causing outages in my career than auth. And I get that it's annoying when they appear in your Rollbar but sometimes Login Failed is the only signal you get that something is wrong. Some 3rd party IdP saying "nope" can be innocuous when it's a few people but a huge problem when it's because they let their cert/application token expire. And I can already hear the "it should be a metric with an alert" and you're absolutely right. Except that it requires that devs take the positive action of updating the metric on login failures vs doing nothing and letting the exception propagate up. And you just said login failures aren't errors and "bad password" obviously isn't an error so no need to update the metric on that and cause chatty alerts. Except of course that one time a dev accidentally changed the hashing algorithm. Everyone was really bad at typing their password that day for some reason. |
|
Sounds like you agree with me. Re-read my comment. Errors are actionable individually. Warnings are actionable in aggregate.
You don't have to treat logs and metrics as separate, you can have rules on log counts without emitting a metric.