| > the properties of the hashes [g]it uses Git uses SHA-1, a hardened version since 2017, and are now doing per-repo upgrades to SHA-256 [0]. Lots of repos are presumably still on SHA-1 (and users on older versions of git). As of 2020, chosen-prefix attacks against SHA-1 are now practical. [verbatim from 1] But I don't think second preimage attacks are practical yet. Linus Torvalds argued in 2006 basically that it's irrelevant whether git's hash function is second preimage resistant. Selective quoting: > remember that the git model is that you should primarily trust only your _own_ repository [2] > [a malicious] collision is entirely a non-issue: you'll get a "bad" repository that is different from what the attacker intended, but since you'll never actually use his colliding object, it's _literally_ no different from the attacker just not having found a collision at all [2] All that is just to say: git originally chose its hashes for the above mentioned "git model", thus didn't 100 % care about second preimage resistance. For your suggested search engine, depending on how the database is collected you might not be able to trust "your own repository" (if it's crowdsourced I could register another codebase with the same hash as Linux). A second preimage resistant hash function would be a requirement for the suggested use case. [0]: https://git-scm.com/docs/hash-function-transition/ [1]: https://en.wikipedia.org/wiki/SHA-1#cite_ref-8 [2]: https://marc.info/?l=git&m=115678778717621&w=2 |