Hacker News new | ask | show | jobs
How would Git handle a SHA-1 collision on a blob? (stackoverflow.com)
33 points by ooronning 3420 days ago
2 comments

answer straight from the source: http://marc.info/?l=git&m=115678778717621&w=2
>you somehow are very very unlucky, and two files end up having the same SHA1. At that point, what happens is that when you commit that file (or do a "git-update-index" to move it into the index, but not committed yet), the SHA1 of the new contents will be computed, but since it matches an old object, a new object won't be created, and the commit-or-index ends up pointing to the _old_ object.

>You won't notice immediately (since the index will match the old object SHA1, and that means that something like "git diff" will use the checked-out copy), but if you ever do a tree-level diff (or you do a clone or pull, or force a checkout) you'll suddenly notice that that file has changed to something _completely_ different than what you expected. So you would generally notice this kind of collision fairly quickly.

That seems like a dangerously dubious assumption.

It would imply that every coder has a clear, accurate mental expectation of what every file should look like as of the latest commit and will notice it before further commits are added.

I consider it a good day when everyone remembers to rebase in the right order!

unless i'm reading wrong (and quite possibly i am), that may have be the behavior meant, but seems in conflict with reality:

Someone tried it, and it looks like it silently corrupted the repo

The corruption may be that their patch, that reduced git's hash to 4bit might not take all of git's internal behavior into account.