Hacker News new | ask | show | jobs
by pussypusspuss 3184 days ago
SHA1 hash? You’ve gotta be fucking kidding
3 comments

It's from a database from 2012. Wasn't that cutting edge in 2012? What kind of sick shit will we be encoding our stuff 6 years from now, that bcrypt will seem laughably ill-suited.
The most popular post that ever ran on Matasano's security blog was the one where I encouraged people to migrate to bcrypt. In 2007. Bcrypt, of course, is much older; Niels and David invented it as the standard password format for OpenBSD back in 1999 --- and bcrypt was a response to FreeBSD's iterated salted hash format, which also had a work factor, and is years older still.

Today, in 2017, bcrypt remains a sound recommendation. You can do better, but for password databases on websites, not materially better.

Salted SHA-1 hashes (salted SHA-anything hashes) were malpractice in 2012.

> You can do better, but for password databases on websites, not materially better.

Do you mean using scrypt? What do you mean by materially better?

No, I mean bcrypt.

Scrypt is better than bcrypt, but mostly not in ways that make much of a difference in 2017.

PBKDF2 comes close to being materially worse than bcrypt and scrypt, because it's especially straightforward on modern hardware, but even PBKDF2 is fine.

For the most part, as long as you're using anything with a KDF-like design for your password hash, a compromise of your password database is going to reveal the very terrible passwords and only those passwords; the rest will be too costly to crack.

Right now given the choice I'd use scrypt and go slightly out of my way to get it (if there was a good 3rd party library for it and bcrypt was in the standard library and I was like a "yarn add" away from having it, I'd take that step), but I would not convert a bcrypt site to scrypt.

Considering that PBKDF2 has adjustable difficulty parameters would you still say it's worse if very high difficulty parameters are chosen?
It has to do with defender's vs attacker's costs. PBKDF2, which is usually instantiated with SHA-2, even with huge amount of rounds is still a lot cheaper for the attacker than for the defender, since the attacker can use GPU/ASIC, requiring fewer transistors, running many calculations in parallel, while defenders usually use CPU. On the other hand, bcrypt, scrypt, Argon2 don't provide a lot of advantage to the attacker compared to CPU, since GPU and ASIC implementations are expensive and memory-bound.

PS My measurements show that pure JavaScript implementation of scrypt is better than fast native PBKDF2 provided by WebCrypto API or Node.js at the same running time.

PPS But yeah, if you can't use bcrypt/scrypt/Argon2, but can use PBKDF2 with high number of rounds, sure, do it.

Thanks for the reply. I appreciate the added explanation.
Yes to scrypt, but these days, Argon2[1] is the best.

"better" means "cannot be calculated much faster on GPU or even FPGA because it requires a lot of RAM".

"materially better" probably means "less than a million of hashes per second on eight Nvidia GTX 1080 running hashcat"[2], so Django and scrypt are both good (adjust work factor as needed, of course).

1 - https://en.wikipedia.org/wiki/Argon2

2 - https://gist.github.com/epixoip/a83d38f412b4737e99bbef804a27...

Thanks for the reply, and added info on being materially better.
> Salted SHA-1 hashes (salted SHA-anything hashes) were malpractice in 2012.

I'm pretty sure this is still the only option on Google App Engine. You can't upload C code, so bcrypt isn't an option.

You can't store passwords as salted SHA-x hashes. It's not OK to do that. If you have SHA-anything, you have PBKDF2; use that.
It has never been the only option on Google App Engine (GAE). Bcrypt is exactly what I used in my GAE app back in 2009, though I have since moved on to scrypt. Bcrypt can be implemented in non-C languages, and there are libraries available for all the languages that are supported on GAE. If you're worried about Python performance, then you can have the bcrypt function in a separate module written in Java/Go.
Malpractice? Someone using SHA-1 for password storage wouldn't even be a medium severity issue on a modern pentest.

I agree with you, but it was very strange to discover that password storage is basically ignored in pentests. Especially after years of you drumming it up as a big deal.

Using SHA-1 for password storage would be sev:low in a pentest. There are a lot of other sev:low things that you would certainly agree are signs of incompetence. Unsoundness of engineering and vulnerability impact are almost orthogonal.
The issue is that companies can basically ignore sev:low findings. "Malpractice" implies that they need to care; they do not.

I wish they did. It would be nice if they were forced to care. But it wouldn't block them from being declared secure by a pentest. Low-severity findings are findings, yes, but they don't have the same pull as medium or high severity vulns.

All of this is true for storing passwords in plaintext, too. If some company leaked plaintext passwords, people would be outraged. Yet pentests would still give that company a pass, because plaintext password storage is sev:low.

I understand what you're saying, but second-order findings on pentests don't get high severity, no matter how important a sign of unsoundness they are. Severity and importance are also somewhat orthogonal.
It was never cutting edge. It was half informed lazy coder homemade crypto in 2012.

SHA1 is a fast hash. It's designed to be tractable to calculate lots of SHA1 in a small time. This is independent of whether it has collisions and is considered broken. It was fast from day 1. Fast hashes are not suitable for protecting passwords. They were never suitable for protecting passwords.

I can only speak to what was mainstream. In my sphere at the time SHA1 was cutting edge, most of my peers were on MD5. The best among us recommending SHA1.
I don't want to be too much of a jerk about this because I get that this is an expert subject but if the best among you were recommending salted SHA-anything in 2012, the best among you were committing professional malpractice.

Honestly, I feel like when we wrote that dumb bcrypt post in 2007, it was already a bit negligent to be using unstretched general purpose hashes for password storage. The BSD's used better hashes in the 1990s.

It was not at all cutting edge. In 2012 I helped a number of companies move from bcrypt to scrypt.
Which is what the article says Discus did, they moved /from/ sha1 /to/ bcrypt in 2012. Same as the companies you helped, in 2012.
No, those companies did not use SHA1 in 2012 or any time close to then. They used bcrypt until they upgraded to scrypt.

SHA1 was useless for passwords long before then.

SHA-1 has been known to be vulnerable since 2005, and even in 2012 SHA-2 and SHA-3 were recommended.

Nonetheless, you have a point!

>SHA-1 has been known to be vulnerable since 2005, and even in 2012 SHA-2 and SHA-3 were recommended.

FYI, the requirements for a password hash function is significantly different than for a cryptographic hash function. the vulnerabilities you're talking about doesn't affect any of those properties. password hashes only need to have preimage resistance, and (more importantly) be slow as to limit offline attacks.

This is pretty much correct. It doesn't much matter what cryptographic hash you use to store secrets, and all of the general-purpose cryptographic hashes are bad password hashes. Salted SHA-3 would not be materially better than salted SHA-2 here.
Significantly better than using MD5 or storing in plaintext in 2012 (both of which would have been likely in 2012).

And in 2012 the current breaks from this year were not yet known. Some considered sha1 to be in its twilight, but it was not 'broken' yet at that time.

It is in fact not significantly better, for this purpose, than MD5.
That assertion is much easier to make now, with the knowledge we have in 2017, five years later.

But without knowledge of what was coming for sha1 in five years, back in 2012 it would have been a much better choice than either MD5 or plaintext storage.

However, even today, with the knowledge we now have regarding sha1, if ones choices are limited (for some strange reason) to only sha1 or MD5, sha1 is still a better choice than MD5. Yes, sha1 is weak, and it should clearly not be used for any new designs, but sha1 is still stronger than MD5.

Also note, the 2012 date was when they last used sha1, not when they started using it. That fact is somewhat critical to keep in mind. They last used sha1 in 2012. What got leaked were some leftover hashed passwords that never got updated to bcrypt that were still hanging around in their database (probably because those accounts have never logged in for the last five years and been forced through a password change).

No. For similar reasons, salted SHA-2 is also not materially better than MD5. You think this is about the strength of the underlying cryptographic hash, but that has in fact very little bearing on the strength of the password hash construction.
Clearly there is some critical piece of knowledge that I'm lacking, so please help me understand where my misunderstanding lies.

The article announcing the breach contains the term "SHA1" in exactly two places: "passwords (hashed using SHA1 with a salt;" and "password hashing algorithm from SHA1 to bcrypt".

Absent evidence to the contrary (of which the article provides no such evidence), I am reading "hashed using SHA1 with a salt" to mean they used this construction:

    Hp = H(S||P) or
    Hp = H(P||S)
    where:
    S is a salt (derivation method unstated)
    P is the plaintext password
    || is byte concatenation
    H( ) is a hash function (sha1 in this specific case)
         applied only once to the input bytes
    Hp is the "hashed salted password"
How does the strength of the construction H(S||P) (or H(P||S)) not have a direct bearing on the strength of the chosen hash? It is nothing but the chosen hash. What am I misunderstanding here?
Forget about the strength of the underlying hash. That's not how you recover passwords from hashed password databases. In reality, the way you recover passwords is to take a dictionary starting at AARDVARK and work your way to ZEBRA and every alphanumeric string in between, hashing each one and comparing it to the target password. Because MD5, SHA1, SHA2, Blake, Blake2, and SHA3 are all designed to be as fast as possible, this attack is extremely effective, and can be accelerated dramatically with GPUs.

The "password hashes" PBKDF2, bcrypt, scrypt, and Argon2 are all designed, the same way a KDF is designed, to mitigate this attack. All of them have a "work factor" that requires you to iterate the underlying hashing primitive (which might very well be SHA2) many times before arriving at the answer.

SHA1 and SHA2 aren't password hashes. That's what people here keep trying to explain. None of the well-understood flaws in MD5 and SHA1 are really relevant to the password hash setting. They're a disaster for cryptographic signature constructions, but they do not matter at all for passwords.

Sha1 hasn't been the recommended best practice for a very long time. (Really ever.) Bcrypt dates back to 1999. Even if you give it 10 years for evaluation it would have to be considered in 2009. And indeed it was recommended in 2007, 5 years before this breach. RFC2898 (PBKDF2) came out in 2000, 12 years before this breach. Scrypt was released in 2009, so I could understand not adopting it by 2012 out of concern for insufficient vetting. Sha1 would only have been acceptable between 1995 (its release) and 2000 or so. Though even then the practice of key stretching was known: IIRC /etc/shadow has done that since the beginning, running 1000 iterations of MD5 by default. Looking it up that was released in 1987. 25 years!
http://valerieaurora.org/hash.html

that's BS to think sha1 was the best hash you could pick in 2012

This is a chart of general-purpose hashes, not password hash constructions. All the hashes on Valerie's chart are bad password hashes.
I said nothing at all about sha1 being "the best ... you could pick". You read that in from somewhere.

I said it (sha1) was significantly better than MD5 or plaintext. That neither says nor implies that sha1 is best, just that it was better than other options that some might have chosen in 2012.

And that is false, sorry to say. Plainly false. The weaknesses unique to MD5 (in 2012) and SHA1 (in 2016) don't matter for password hash constructions. The weaknesses shared by salted MD5, SHA1, SHA2, and SHA3 --- each a distinct construction from the underlying hash --- matter hugely for password storage.

The problem is that MD5, SHA1, SHA2, and SHA3 are not password hashes. The password hash constructions in common use are PBKDF2, bcrypt, scrypt, and Argon2. Some of them use SHA2 as a primitive, some of them don't, but none of them work by simply concatenating a salt with a password and hashing.

It doesn't matter if it's a "password hash" if it's a cryptographically secure hash and a long enough password. If it can withstand all the attacks that give you shortcuts to finding out what the input was, given the output, it's fine.

Password hashes only help protect against brute force searches by increasing the cost to attack linearly with the cost to verify. But that isn't a great tradeoff and isn't future-proof.

All the crypto engineering that goes into password hashes is about the fact that passwords aren't long enough, so your "if" caveat makes your argument rather disconnected from the real world. People won't use passwords with the sufficient amount of entropy, they couldn't even if they wanted to (because of memorizing difficulties, typos, lack of good text entry UI on mobile devices, etc).

As long as you're using a password entry field designed for manual entry, you can't credibly counter that with "people should use password managers and autogenerated long line-noise passwords". Because you can't base your security upon all your users taking the initiative and doing the power-user non-default thing.

In the 2007-2012 era, SHA1 was common. Also, the salts will slow down cracking a little for passwords not already known.
Nobody uses rainbow tables or cares to mitigate them. People care that GPU rigs get hundreds of billions of hashes-per-second[1] against a single-iteration salted hash. So all 8-char case-sensitive alphanumeric combinations can be checked in 18 minutes[2].

1 - https://gist.github.com/epixoip/a83d38f412b4737e99bbef804a27...

2 - (pow(26+26+10, 8) / 2*pow(10, 11)) / 60

If you are attacking just one password, that makes sense. But if you want to check all the compromised accounts for easy to guess passwords, a salt will increase the cost.
Salt won't save you. For checking most common passwords against stolen database, you try the top one million most common passwords against each hash, at a rate of 200,000 hashes per second.

A dictionary-based attack that tries variants and inserts digits and spends one second per hash will catch the less common passwords.

No, they won't.