Hacker News new | ask | show | jobs
by royce 3033 days ago
Because:

1) In an online attack, against a properly-configured service, even if password spraying is used, only the first few thousand passwords can be tried before rate-limiting, CAPTCHAs, etc. kick in.

Would a user with a known leaked password at a different site be vulnerable to an online correlation attack? Yes. And that's why some big services supplement their approach by proactively searching for those leaks and forcing a password reset for those specific users.

2) In an offline attack, when the passwords are properly hashed with a modern slow hash, even an expensive GPU or FPGA cluster would take weeks to exhaust a 10,000 word dictionary against a large user corpus, and a significant amount of time even when a single user is targeted.

Would users with '123456' get cracked pretty quickly? Yes. And that's why the top X are forbidden - to make offline attackers have to dig deeper into their wordlists (and thereby also their pocketbooks) to crack a password in a useful amount of time.

2 comments

Eh, "We've got lots of users so it will take a long time to crack them all" isn't much of a defence.

I mean, if you've got Obama or Snowden or Taylor Swift or Logan Paul or whoever as a user, you think hackers wouldn't spent 2 hours of GPU time per account to crack their passwords?

I'm quite familiar with password attack scenarios.

If high-value targets are selecting passwords that would be vulnerable to a targeted cracking attack, the solution isn't to blacklist a half-billion passwords (when they could just as easily come up with literally trillions of other passwords that would also be bad, yet are not included in the blacklist). The solution is to show them how to manage their specialized threat model - 2FA, creating strong passphrases, using a password manager, etc.

I find it hard to believe that you could set a cutoff of passwords that have been leaked but that you could rely on an attacker not to try. These passwords are more useful guesses than anything a password cracker would make up out of components.

XKCD considers a password that's one of 2^28 possibilities "easy" to guess, and provides a well-regarded strategy [1] for coming up with a password that's one of 2^44. Passwords in this list are one of 2^29.

[1] https://xkcd.com/936/

Random passphrases are indeed a good idea. XKCD #936 advocates for 4 words randomly selected from a 6000-word dictionary, which is 6000^4, or ~1.296 × 10^15, which isn't actually that strong if the service in question has chosen a weak password hashing algorithm. When using pure bruteforce or masks (not a dictionary or hybrid attack) against a large-ish corpus of passwords (say, a few million) a system with 6 GTX 1080s can realistically try 8.2 billion SHA1-hashed passwords per second, which would exhaust the entire XKCD 936 keyspace in about 45 hours. (If you bump it up to five words from a 20,000 word dictionary, you get ~3.2x10^21 possibilities, which is better.). And if you focus on a single hash, that SHA1 rate jumps to ~32 billion hashes per second, which would take less than 15 hours.

At that speed, processing the entire Pwned Passwords list would almost take longer to read from disk and into memory than it would take to exhaust against a single password. Password cracking specialists would of course try raw wordlists first (And therefore "more useful", in a way) ... but we many other tools in their arsenal that generate far more than a half a billion candidate passwords. And at that rate, you can exhaust all 8-character passwords made up of printable ASCII - 95^8, ~6x10^15 - in a couple of days. Other techniques (mask, hybrid, rules) can achieve similar rates, and combinator attacks are slower but still pretty efficient.

By contrast, attacking bcrypt cost 12 on the same system can only try ~660 hashes per second - against a single hash. At that rate, if you just tell the attacker "it's somewhere in the Pwned Passwords list", it would take about 210 hours to exhaust the raw list, and 36 years to exhaust all 6-character passwords made up of printable ASCII.

In other words, if a service is storing passwords poorly, that service should be fixing that long before they should be trying to blacklist a half billion passwords. The purpose of blacklisting up front in the password-changing UI isn't to forbid a half-billion passwords. It's a way to reduce risk of online attack - and an opportunity to guide users towards better selection methods. There's a reason why Dropbox only blacklists the top 30K.

> which isn't actually that strong if the service in question has chosen a weak password hashing algorithm.

That only matters if you re-use the password in multiple sites.

If an attacker has access to the hash, that means they cracked the site already at the admin level and got into its user database.

They don't need to crack your password to gain any more access to that same site. (And they already have all the plain text personal info from your account.)

Your only additional problem now is if that password gives them access to your account on other sites that they haven't broken into yet.

The ultimate protection against that is not to have reused that password. That beats the stupidity of "password strength".

If a password is not reused, it has to be only strong enough to survive the five guesses before an account is locked out.

Password strength matters when hashes are public (like in classic Unix non-shadowed /etc/password files). Well, that's a bad idea, which is why we have shadowed password files. Shadowed password files may as well store passwords in cleartext; if those passwords are not reused anywhere, the situation is safe. Anyone who can see the cleartext is already root. If those cleartext passwords don't work on any other system, they are worthless to the attacker.

Thus password strength --- all the fussing with how we properly store passwords with a decently strong hashing function and salting --- is just a fallback strategy to protect password re-users.

> Shadowed password files may as well store passwords in cleartext; if those passwords are not reused anywhere, the situation is safe

Wait, what?

If they were randomly generated and of sufficient length, yes.

If they weren't randomly generated, even if not exactly reused, they are very likely to reveal the psychology of that user's password selection habits. This is of definite value to a focused attacker. Not only could it inform guessing passwords on other systems, it could also inform guessing that user's _next_ password on _this_ system.

> They don't need to crack your password to gain any more access to that same site.

Just because they have the hashes doesn't mean that they have other access. Hash lists are bought, sold, traded, and stolen all the time. Someone who possesses that particular hash may be multiple hops away from the group that originally acquired them.

Also, just because the database layer that the passwords are stored in is owned, does not mean that a particular target level of access has been acquired. Password storage can be abstracted into an entirely standalone subsystem, for which knowing, say, an admin of that system's password would be quite valuable.

It means that suppose the attacker can look in /etc/shadow (due to having root privs) and sees, in plain text, that the password of user "bob" is "correct-horse" (not anything fancy like "correct-battery-horse-staple"). But Bob doesn't use that password anywhere else. So what good is that piece of information to the attacker? On this system, attacker can just "su bob". On systems where attacker is not root, "correct-horse" doesn't get into bob's account.
> If they were randomly generated and of sufficient length, yes.

What does that buy you, if they are in plain text?

(Well, randomness quasi-guarantees that they are not re-used; I covered that.)

If we have passwords in plain text, issues about length related to cracking hashes is moot; the cracking that still matters is someone guessing at the login prompt, where we can lock out accounts after N attempts.

> What does that buy you, if they are in plain text?

Nothing. That's why I was agreeing with you for that subset.

But N may be smaller than you might think, when frequency data is also supplied by the API.

https://gist.github.com/roycewilliams/60b77640a962125b04ae67...

What about the other case - when they're not random, but also not reused ... such that the psychology of the user's password-selection methodology might be exposed?
> Hash lists are bought, sold, traded, ...

All only possible after the horse has escaped the barn.

> Someone who possesses that particular hash may be multiple hops away from the group that originally acquired them.

But if the hash is for a password that was only used on the original compromised system, it is useless, even if the password is recovered.

Just because the horse is out of the barn doesn't mean that the owner of the barn knows about it yet.