Hacker News new | ask | show | jobs
by mns 842 days ago
There are quite some harsh comments here below. You can't plan for every possible failure point, who knows what part of a system/infra out of everything that they have went down and triggered this behaviour. Some things you just can't catch/predict. Especially in huge systems like theirs. I would expect people here to understand things like these and not just call people names for something like this, we all know things seem simple/clear from the outside, but the job of debugging and fixing something like this take quite some effort.
2 comments

This is a company with one of the largest digital infrastructures in the world. An outage is understandable, inability to tell they're having an outage and inform users appropriately is not. Stop making excuses for people who are literally awash in resources.
> Stop making excuses for people who are literally awash in resources.

This is a pretty weird outlook to have - looking at any group awash with resources, whether it be governments or other companies, and you can clearly see that even with those resources, failures still happen.

You can jump up and down and pretend that this is solvable, or you can look at reality, look at all the evidence of this happening over and over to almost everyone, and conclude with some humility that these things just happen to everyone.

(Looking this reality in the face is one of the things motivating my beliefs around e.g. AI safety, climate change, etc.)

> An outage is understandable
It is always better for the company's rep for the issue to have been on your end. Admitting fault comes with a potential liability. It's gaslighting written as an SLA
You can't plan for every contigency, but you can reserve potentially scary message for situations where you know they are correct. An unpected error state should NOT result in a "invalid credentialiald error".
This is the nature of credentials errors. The more information you give, the more you're telling an untrusted and therefore assumed-hostile agent.

I hate it because it's bad UX, but that's the thinking behind it.

Pushing people to unnecessarily reset credentials increases risk. Not only does it increase acute risk, but it also decreases the value of the signal by crying wolf.

The argument here is the kind of nonsense cargo cult security that pervades the industry.

I think this argument falls flat on two axes:

- in general, if the system is broken enough to be giving false-negatives on valid credentials, it's broken enough that there isn't much planning to be done here because the system's not supposed to break. So if they give me "Sorry, backend offline" instead of "invalid credential," they've now turned their system into an oracle for scanning it for queries-of-death. That's useful for an attacker.

- in the specifics of this situation, (a) credential reset was offline too so nobody could immediately rotate them anyway and (b) as a cohort, Facebook users could stand to rotate their credentials more often than the "never" that they tend to rotate them, so if this outage shook their faith enough that they changed their passwords after system health was restored... Good? I think "accidentally making everyone wonder if their Facebook password is secure enough" was a net-positive side-effect of this outage.

So your approach to security is to never admit that an application had an error to a user, but to instead gaslight that user with incorrect error messages that blame them?

This is security by obscurity of the worst kind, the kind that actively harms users and makes software worse.

No. My approach to security is to never admit that an application had an error to an unauthenticated user.

That information is accessible to two cohorts:

- authenticated users (sometimes; not even authenticated users get access to errors as low-level as "The app's BigTable quota was exceeded because the developers fucked up" if it's closed source cloud software)

- admins, who have an audit log somewhere of actual system errors, monitoring on system health, etc.

Unfortunately, I can't tell if the third cohort (unauthenticated users) is my customers or actively-hostile parties trying to make the operation of my system worse for my customers, so my best course of action is to refrain from providing them information they can use to hurt my customers. That means, among other things, I 403 their requests to missing resources instead of 404ing them, I intentionally obfuscate the amount of time it takes to process their credentials so they can't use timing attacks to guess whether they're on the right track, I never tell them if I couldn't auth them because I don't recognize their email address (because now I've given them an oracle to find the email addresses of customers), and if my auth engine flounders I give them the same answer as if their credentials were bad (and I fix it fast, because that's impacting my real users too).

To be clear: I say all this as a UX guy who hates all this. UX on auth systems is the worst and a constant foil to system usability. But I understand why.