That assertion is much easier to make now, with the knowledge we have in 2017, five years later.
But without knowledge of what was coming for sha1 in five years, back in 2012 it would have been a much better choice than either MD5 or plaintext storage.
However, even today, with the knowledge we now have regarding sha1, if ones choices are limited (for some strange reason) to only sha1 or MD5, sha1 is still a better choice than MD5. Yes, sha1 is weak, and it should clearly not be used for any new designs, but sha1 is still stronger than MD5.
Also note, the 2012 date was when they last used sha1, not when they started using it. That fact is somewhat critical to keep in mind. They last used sha1 in 2012. What got leaked were some leftover hashed passwords that never got updated to bcrypt that were still hanging around in their database (probably because those accounts have never logged in for the last five years and been forced through a password change).
No. For similar reasons, salted SHA-2 is also not materially better than MD5. You think this is about the strength of the underlying cryptographic hash, but that has in fact very little bearing on the strength of the password hash construction.
Clearly there is some critical piece of knowledge that I'm lacking, so please help me understand where my misunderstanding lies.
The article announcing the breach contains the term "SHA1" in exactly two places: "passwords (hashed using SHA1 with a salt;" and "password hashing algorithm from SHA1 to bcrypt".
Absent evidence to the contrary (of which the article provides no such evidence), I am reading "hashed using SHA1 with a salt" to mean they used this construction:
Hp = H(S||P) or
Hp = H(P||S)
where:
S is a salt (derivation method unstated)
P is the plaintext password
|| is byte concatenation
H( ) is a hash function (sha1 in this specific case)
applied only once to the input bytes
Hp is the "hashed salted password"
How does the strength of the construction H(S||P) (or H(P||S)) not have a direct bearing on the strength of the chosen hash? It is nothing but the chosen hash. What am I misunderstanding here?
Forget about the strength of the underlying hash. That's not how you recover passwords from hashed password databases. In reality, the way you recover passwords is to take a dictionary starting at AARDVARK and work your way to ZEBRA and every alphanumeric string in between, hashing each one and comparing it to the target password. Because MD5, SHA1, SHA2, Blake, Blake2, and SHA3 are all designed to be as fast as possible, this attack is extremely effective, and can be accelerated dramatically with GPUs.
The "password hashes" PBKDF2, bcrypt, scrypt, and Argon2 are all designed, the same way a KDF is designed, to mitigate this attack. All of them have a "work factor" that requires you to iterate the underlying hashing primitive (which might very well be SHA2) many times before arriving at the answer.
SHA1 and SHA2 aren't password hashes. That's what people here keep trying to explain. None of the well-understood flaws in MD5 and SHA1 are really relevant to the password hash setting. They're a disaster for cryptographic signature constructions, but they do not matter at all for passwords.
Sha1 hasn't been the recommended best practice for a very long time. (Really ever.) Bcrypt dates back to 1999. Even if you give it 10 years for evaluation it would have to be considered in 2009. And indeed it was recommended in 2007, 5 years before this breach. RFC2898 (PBKDF2) came out in 2000, 12 years before this breach. Scrypt was released in 2009, so I could understand not adopting it by 2012 out of concern for insufficient vetting. Sha1 would only have been acceptable between 1995 (its release) and 2000 or so. Though even then the practice of key stretching was known: IIRC /etc/shadow has done that since the beginning, running 1000 iterations of MD5 by default. Looking it up that was released in 1987. 25 years!
But without knowledge of what was coming for sha1 in five years, back in 2012 it would have been a much better choice than either MD5 or plaintext storage.
However, even today, with the knowledge we now have regarding sha1, if ones choices are limited (for some strange reason) to only sha1 or MD5, sha1 is still a better choice than MD5. Yes, sha1 is weak, and it should clearly not be used for any new designs, but sha1 is still stronger than MD5.
Also note, the 2012 date was when they last used sha1, not when they started using it. That fact is somewhat critical to keep in mind. They last used sha1 in 2012. What got leaked were some leftover hashed passwords that never got updated to bcrypt that were still hanging around in their database (probably because those accounts have never logged in for the last five years and been forced through a password change).