Hacker News new | ask | show | jobs
by spappal 2098 days ago
> the properties of the hashes [g]it uses

Git uses SHA-1, a hardened version since 2017, and are now doing per-repo upgrades to SHA-256 [0]. Lots of repos are presumably still on SHA-1 (and users on older versions of git).

As of 2020, chosen-prefix attacks against SHA-1 are now practical. [verbatim from 1] But I don't think second preimage attacks are practical yet.

Linus Torvalds argued in 2006 basically that it's irrelevant whether git's hash function is second preimage resistant. Selective quoting:

> remember that the git model is that you should primarily trust only your _own_ repository [2]

> [a malicious] collision is entirely a non-issue: you'll get a "bad" repository that is different from what the attacker intended, but since you'll never actually use his colliding object, it's _literally_ no different from the attacker just not having found a collision at all [2]

All that is just to say: git originally chose its hashes for the above mentioned "git model", thus didn't 100 % care about second preimage resistance. For your suggested search engine, depending on how the database is collected you might not be able to trust "your own repository" (if it's crowdsourced I could register another codebase with the same hash as Linux). A second preimage resistant hash function would be a requirement for the suggested use case.

[0]: https://git-scm.com/docs/hash-function-transition/

[1]: https://en.wikipedia.org/wiki/SHA-1#cite_ref-8

[2]: https://marc.info/?l=git&m=115678778717621&w=2