Hacker News new | ask | show | jobs
by js2 603 days ago
> file name does not include its full path

No, it is the full path that's considered. Look at the commit message on the first commit in the `--full-name-hash` PR:

https://github.com/git-for-windows/git/pull/5157/commits/d5c...

Excerpt: "/CHANGELOG.json" is 15 characters, and is created by the beachball [1] tool. Only the final character of the parent directory can differntiate different versions of this file, but also only the two most-significant digits. If that character is a letter, then this is always a collision. Similar issues occur with the similar "/CHANGELOG.md" path, though there is more opportunity for differences in the parent directory.

The grouping algorithm puts less weight on each character the further it is from the right-side of the name:

  hash = (hash >> 2) + (c << 24)
Hash is 32-bits. Each 8-bit char (from the full path) in turn is added to the 8-most significant bits of hash, after shifting any previous hash bits to the right by two bits (which is why only the final 16 chars affect the final hash). Look at what happens in practice:

https://go.dev/play/p/JQpdUGXdQs7

Here I've translated it to Go and compared the final value of "aaa/CHANGELOG.md" to "zzz/CHANGELOG.md". Plug in various values for "aaa" and "zzz" and see how little they influence the final value.

2 comments

Sounds like it needs to be fixed to FNV1a
No, the problem isn't the hash. It does what it was designed to do. It's just that it was optimal for a particular use case that fits the Linux kernel better than Microsoft's use case. Switching the hash wouldn't improve either situation. If you want to understand this deeper, see the linked PRs.
Thanks for the deep dive!