| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bk2204 3386 days ago
	I'm the person who's been working on this conversion for some time. This series of commits is actually the sixth, and there will be several more coming. (I just posted the seventh to the list, and I have two more mostly complete.) The current transition plan is being discussed here: https://public-inbox.org/git/CA+dhYEViN4-boZLN+5QJyE7RtX+q6a...

5 comments

rurban 3385 days ago

I do like your hashname/nohash idea. If we could come up with a simple compression negotiation protocol also: zlib -> zstd. But this will be much harder, as hashes are internal only, and compression is in the protocol.

kudos to brian m carlson to convince linus to use sha3-256 over sha256. this is really the only sane option we have.

link

weinzierl 3385 days ago

I don't understand what you mean by "hashes are internal only"? Aren't the sha1's everywhere right now. I mean not only in the protocol but also part of the UI and from there they even spread into bug trackers, documentation and so forth.

link

lisper 3385 days ago

> this is really the only sane option we have

Why?

link

wolf550e 3385 days ago

Yeah, I would have gone with BLAKE2. It's much faster than SHA-256 and SHA3-256: https://blake2.net/skylake.png

link

benchaney 3385 days ago

This is a perfect example of a situation where hashing performance doesn't matter at all.

link

tikhonj 3385 days ago

I'm not familiar with Git internals. Does the performance of the hashing algorithm contribute significantly to how Git deals with large files or with operations over a large number of small files?

I've run into performance problems with things like MathJaX, which includes thousands (or tens of thousands?) of files as a backup method for rendering equations. (I understand each file has a single character in some typeface.)

link

harshreality 3385 days ago

The hash function may not matter for overall git performance in virtually all dev machine setups, but there will be a (maybe tiny, maybe larger, depending on the repo and disk io speed) difference in cpu utilization and heat generation, right?

link

copperx 3385 days ago

That's a silly thing to worry about when you're developing Ruby or Java applications. My PC boots faster than the Rails console or IntelliJ.

link

harshreality 3385 days ago

sha3 will probably get hw accel eventually. Blake2 is less likely to. It's like the dilemma between chacha20 and a stream cipher mode for aes. An argument could be made for either, depending on application specifics and available hardware.

link

wolf550e 3384 days ago

But like its ancestor chacha, blake2 is fast on anything that has SIMD.

link

weinzierl 3385 days ago

Did you make any measurements or back of the envelope calculations what the real world performance impact of this change is.

I don't expect anything horrible, but still curious.

EDIT: After skimming OP I found a few answers.

The message from the The Keccak Team [1] is especially interesting. Summary is that we don't have to worry about performance degradation because of the hash calculation itself. There is a palette of functions which are considered to have a "security level [...] appropriate for your application" and are considerably faster than SHA1.

[1] https://public-inbox.org/git/91a34c5b-7844-3db2-cf29-411df5b...

link

hsivonen 3385 days ago

If git changed to BLAKE2b, I'd expect a perf improvement over SHA-1.

link

sorenbs 3386 days ago

Out of curiosity: when did you start to take the first serious steps in this direction?

link

bk2204 3386 days ago

From the commit history, 2015 (commit 5f7817c85d4b5f65626c8f49249a6c91292b8513).

I proposed the idea of improved compile-time checking and maintainability, as there wasn't originally much interest in a new hash function, but the maintainability improvements were something people could go for.

I hadn't spent as much time working on it as I am now, so it moved slowly. Other people also helped by converting parts of the code that they were working on (like parts of the refs subsystem).

link

sorenbs 3385 days ago

Thanks!

link

drostie 3386 days ago

I'm not quite so familiar with the Git internals, how do you deal with the problem of having different non-leaf nodes scattered through the directory tree?

This might be a non-issue based on how Git stores the tree, but I can imagine one simple model where each directory would be a sort of "collection object", a binary encoding of a list of (filename, hash) pairs in filename order, and therefore the directory gets a hash of its own. But that means that when you're communicating with a SHA-1 repository you don't just need to rename this object; its contents also need to be changed pre-rename, and then you need to store every internal node twice. I'm not seeing that in your summary.

Is it just that Git doesn't have any internal nodes in the directory tree per se because the "filename" is a full POSIX path with subdirs? Or what?

link

evmar 3386 days ago

https://git-scm.com/book/en/v2/Git-Internals-Git-Objects has descriptions of the objects. Both trees and commits are hashes over data that includes hashes of other objects so they must be different. The doc discusses converting them at transmission time, search for [convert to sha256] in it.

link

snakeanus 3385 days ago

>b. A SHA256 repository can communicate with SHA-1 Git servers and clients (push/fetch).

Wouldn't fetching from a sha-1 repository degrade security? I think it would be better to show a warning (similar to how openssh does with 1024 bit dsa keys) every time you try to fetch from a SHA-1 git repo. Same for pushing a signed commit to a sha-1 repository.

link

bhhaskin 3385 days ago

The sha1 hash isn't used for security. You should be signing your commits if security is a concern.

link

nshepperd 3385 days ago

Uh, even a signed commit does still rely on the sha1 hash of the actual tree object and any parent commits. It won't stop something bad from happening if you fetch from a sha1 repo.

link