Hacker News new | ask | show | jobs
by jlgaddis 3378 days ago
Perhaps you haven't read Linus' comments where he stated (more than a decade ago) that the usage of SHA1 here isn't for "security"?

(Hint: that's why GPG signing commits is an option.)

4 comments

I read those comments more than a decade ago. They seemed weak but tolerable then. They seem broken now. Git is supposed to guarantee that the code I see is the code the author saw, in a distributed and decentralized environment. This is Git's entire reason for existing.

A secure design is essential for trusting this functionality. My trust in Git has always been tempered by the weakness of SHA1.

A GPG signature is no stronger than its object ref.

Have you seen how many frameworks believe "auto-pull and compile deps by hash from github" is reasonable? They are assuming this isn't a massive attack vector. They are trying to build on a core feature that Git claims to have.

Recent events moved this from probably foolish to provably so.

When you GPG sign a commit, you just GPG sign its hash, you're not signing its diff alongside it.
That's what comes to mind every time someone brings up Linus' comments from way back when. If SHA-1 is insecure, then there is no way to have security. Forge an object, and GPG sign its commit, and you have broken the apparent security GPG signing was meant to bring. If SHA-1 was not meant for security, then security must have been a non-goal of Git.

The comments are brought up usually to explain why Linus didn't think much of it at the time, whereas they actually demonstrate the shift of thinking around what Git is meant to provide. Security is definitely a goal now, and the hash function is the critical piece of security infrastructure.

GPG signatures actually sign the hash digest of the text they're given. Fun fact, which I think (hope) changed in recent versions of GPG: the hash, by default, is (was?) SHA-1.

One can check what is used with e.g.

  $ git cat-file -p $some_tag | gpg --list-packets | grep "digest algo"
The output is of the form

  digest algo n, begin of digest xx yy
Where n can be:

  1: MD5
  2: SHA1
  8: SHA256
  10: SHA512
(See RFC 4880, 9.4 for all values)
Interesting, I didn't know! Although it makes a lot of sense now that you bring it up.

I don't think it changes anything though, because of git's integrity. Stop me if I'm getting this wrong but, if you wanted to attack a signed git commit through the gpg signature's hash, you would have to modify the commit object itself... which yields a different commit hash in order to be valid. You'd have to get absurdly lucky to have a signature collision that contains a (valid) commit hash collision.

The text that GPG signs on a git tag is:

  object $sha1
  type commit
  tag $name
  tagger $user $timestamp $tz

  $text
If you wanted to attack a signed git commit through the gpg signature's hash, you would have to do a second preimage attack on that text with a different commit sha1.

OTOH, if you wanted to attack a signed git commit through the git commit sha1, you would have to do a second preimage attack on that commit text, which is of the form:

  commit $length\0
  tree $sha1
  parent $parent_sha1
  author $author $author_timestamp $author_tz
  committer $committer $committer_timestamp $committer_tz

  $text
See where I'm going? it's the same kind of attack.

Another way to attack it would be to do a second pre-image attack on the pointed tree, which is harder because there is not really free-form text available in a tree object.

Yet another way to attack it would be to do a second pre-image attack on one of the blobs pointed to by a tree, where the format is of the form:

  blob $length\0$content
I don't think this is significantly easier than any of the second pre-image attacks mentioned above.

So, in fact, in any case, to attack a gpg signed git tag, you need a second pre-image attack on the hash. If git uses something better than SHA-1, but GPG still uses SHA-1, the weakest link becomes, ironically, GPG.

That being said, second pre-image attacks are pretty much impractical for most hashes at the moment, even older ones that have been deemed broken for many years (like MD5 or even MD4 (TTBOMK)).

That is, even if git were using MD4, you couldn't replace an existing commit, tree or blob with something that has the same MD4.

Edit:

In fact, here's a challenge:

Let's assume that git can use any kind of hash instead of SHA1. Let's assume I have a repository with a single commit with a single tree that contains a single source file.

The source file is:

  $ cat hackme.c
  #include <stdio.h>

  int main() {
    printf("Hack me, world!\n");
    return 0;
  }
So that we all talk about the same thing, here is the raw sha1 for this source:

  $ sha1sum hackme.c
  cffc02c09faf2e9a83ecbb976e1304759868cf1c  hackme.c
And its git SHA1:

  $ git hash-object hackme.c
  36134c8c8e9fdf705441dcc1f71736064afc7c44
Here is how you can create this SHA1 without git:

  $ (echo -e -n blob $(stat -c %s hackme.c)\\x0; cat hackme.c) | sha1sum
  36134c8c8e9fdf705441dcc1f71736064afc7c44  -
or

  $ (echo -e -n blob $(stat -c %s hackme.c)\\x0; cat hackme.c) | openssl sha1
  (stdin)= 36134c8c8e9fdf705441dcc1f71736064afc7c44
And for git variants that would be using MD5:

  $ (echo -e -n blob $(stat -c %s hackme.c)\\x0; cat hackme.c) | openssl md5
  (stdin)= 1b56dbc6613ff340b324ca973aec67f9
Or MD4:

  $ (echo -e -n blob $(stat -c %s hackme.c)\\x0; cat hackme.c) | openssl md4
  (stdin)= 0eaabfc1a32629dce98c476f591c3f60
The challenge is this: attack the hypothetical repository using the hash of your choosing[1] ; replace that source with something that is valid C because people using the content of the repository will be compiling the source. Obviously, you'll need the hash to match for "blob $length\0$content" where $length is the length of $content, in bytes, and $content is your replacement C source code.

1. let's say, pick any from the list on http://valerieaurora.org/hash.html

I posit you'll spend a lot of time and resources (and money) on the problem, (exponentially more so than Google did with SHAttered) except for Snefru.

I was only talking about git commits though. For tags we agree, as the tag is only a pointer (https://twitter.com/Adys/status/835595116110823425).

But for the commit it's different, because the $text in your example affects the hash of the commit itself. And my understanding is that if you sign the commit, you're signing both the contents and the hash of the content. Am I incorrect?

If you sign the commit, you sign the exact text I quoted, where $text is what you pass to the `-m` argument to `git tag`.
Yes, Linus wrote that SHA1 isn't here for security, but that was a glaring misunderstanding of security on his part. Integrity protection of source code is a security function.
I think it's mainly due to a different threat model. Linus only pulls from his trusted lieutenants, who are unlikely to try to attack the source in that way (it's way easier to simply hide a bad commit in the lot, no need to fiddle with SHA1). They do the same.

The rest of the code is sent through mailing lists as patches, so the hash is irrelevant.

SHA1 here protects against "random" corruption (which is more than some VCS do), but not an attacker. At no point one is able to send trusted contributors bad commit objects.

Now, the use people have of git is very different from the kernel (or git) style​, so their threat model is different, and SHA1 may become a security function.

I understand your point. However, that doesn't take into account defense in depth which says that more than a single control should be in place.
Well, if SHA1 isn't for security, there's no reason to switch away from it today.