Hacker News new | ask | show | jobs
by dahart 3380 days ago
That's exactly what this change is. You mean why wasn't it that way before the change? Maybe because it wasn't ever needed before? Git's been good with only sha-1 for 12 years. Think about the flip side of your question... what were the alternatives 12 years ago, or 5 years ago? And why would someone write code for alternatives that aren't expected to be used and maybe don't exist?

In my experience, generalizing ahead of need more often than not causes problems, and I've watched over-engineering result in far more effort to fix when the need it was anticipating does arrive than just waiting until the need is there.

6 comments

> Think about the flip side of your question... what were the alternatives 12 years ago, or 5 years ago?

SHA-2 and RIPEMD.

> And why would someone write code for alternatives that aren't expected to be used and maybe don't exist?

That's the problem: the software industry is still suffering from MD5 getting cracked [0]! Cryptographic agility is a baseline requirement for security primitives.

> In my experience, generalizing ahead of need more often than not causes problems

I agree and Linus has valid complaints about security recommendations during the 25-year history of Linux: most of the security recommendations kill performance and are only partial fixes, so why bother?

But Linus is also engaging in premature optimization: computers are ~30 billion times faster than when he first starting programming Linux. Yes, SHA-2 is relatively slow, they could have at least not hardcoded SHA-1 into the codebase and protocol.

> I've watched over-engineering result in far more effort to fix when the need it was anticipating does arrive than just waiting until the need is there.

You clearly haven't done any safety related engineering. That's the thing about cryptography: millions of dollars and human lives are at stake. Despite the smartest people in the world working on these problems, cryptographic primitives/protocols are regularly broken. Due to Quantum computing, every common cryptographic primitive we use today will need to be replaced or upgraded at some point.

Thankfully, you don't need to worry about the engineering of a given cryptographic primitive as long as you can swap it out with a new one. But when you hardcode a specific hash function and length into your protocol/codebase you are now assuming the role of a cryptographer.

[0]: https://en.wikipedia.org/wiki/Flame_%28malware%29

Even totally ignoring that SHA2 was a thing, anybody looking around would have noticed that MD4 was broken, MD5 was broken, and it would be unlikely that the hash of today would stand forever.
Yes, true. Correct. That is still true, and applies to SHA-2 as well. And Linus was aware of exactly what you say, back in 2005.

My point was that the choice that was made was considered good enough for the purposes for which it was intended. In the context of the OP's comment, criticizing git for not making different code design choices doesn't mean that Linus was wrong, it may mean that the OP doesn't know and/or understand all the considerations Linus had. And Linus has said many times that the security of the hash is not the primary consideration in his design.

Git's choice of SHA-1 was not at the time predicated on having the single most unbreakable hash in existence, the hash's use is not for security purposes, and to talk with such incredulity about Linus' choice may be to misunderstand git's design requirements.

Well, this whole mess proves that Linus was wrong. Typing "unsigned char [20]" everywhere is beyond amateurish to me in any case and raises a concern about the overall quality of code in git and linux kernel.
It shows that Linus is not a cryptographer, to be more precise. Though yes SHA-1 chosen prefix attacks are still very expensive at this point. I wonder how many non-cryptographers knew about SHA-2 back in 2003-2004.
No it's more than that. "unsigned char[20]" already has at least three potential points of failure (and why isn't it uint8 anyways). Moreover, it'll be referenced as unsigned char*, which opens another can of worms. And oh, by the way, have fun searching all references to sha1 on your source code now that you weren't pro enough to create a type for your object ids / hashes.

I'm guessing it's part lack of skill in design, part bad software development tools (uEMACS and makefiles or something), and part just being against c++ et al.

Linus regularly treats security as a second-class citizen and is famous for his outrageous harassment [0]:

> Of course, I'd also suggest that whoever was the genius who thought it was a good idea to read things ONE FCKING BYTE AT A TIME with system calls for each byte should be retroactively aborted. Who the fck does idiotic things like that? How did they noty die as babies, considering that they were likely too stupid to find a tit to suck on?

He deserves to eat this shit sandwich.

> I wonder how many non-cryptographers knew about SHA-2 back in 2003-2004.

Any systems engineer should have known about SHA-2. SHA-1 only provides 80-bits of security, so everyone else assumed that it would need to be replaced.

[0]: https://en.wikiquote.org/wiki/Linus_Torvalds

What does his "outrageous harassment" have to do with his ignorance towards security?

I agree that he should've used SHA-2 or better yet, have made the hash algorithm modular, but what does your quote add to the discussion?

> MD4 was broken, MD5 was broken

There are no practical pre-image attacks for either of them yet. (2^102 for MD4, 2^123 for MD5)

> what were the alternatives 12 years ago, or 5 years ago?

SHA-2.

> And why would someone write code for alternatives that aren't expected to be used and maybe don't exist?

Well, the real question is why someone picked SHA-1 over SHA-2 in 2005 when attacks that reduced its strength were already being demonstrated.

Linus has explained why he picked SHA-1. I'm not Linus, and I'm not defending his choice, but he has said repeatedly that git's hash is primarily for indexing and error correction, and not primarily for security. Clearly he felt like SHA-1 was "good enough". And if you have something that's "good enough" there are reasons not to write code for alternatives you're not going to use.
>but he has said repeatedly that git's hash is primarily for indexing and error correction, and not primarily for security

And he was wrong as openpgp signatures on commits and tags are a thing.

Not sure when that feature was introduced however, I doubt that it existed in the first version of git. That being said he should have changed the hash function the moment that feature was introduced.

Signatures were introduced in git as part of the response to the kernel.org hack in 2011.
The first attack on SHA-1 was published in 2003. Git showed up in 2005. Not only should git have allowed for something else, it never should have used SHA-1 in the first place.
Not really adding much, but damn it i feel old reading that.

I still recall freshly the hoopla over Bitkeeper licensing that lead to Torvalds creating Git.

SHA2 predates git by about four years.