Hacker News new | ask | show | jobs
by oconnor663 1269 days ago
> if you're able to sneak something into a repository in the first place (e.g. a benign file that generates a collision with a malicious file), then you're probably able to sneak in something more directly

Could you imagine using an implementation of TLS that "probably" authenticated your network traffic though? I think there are two separate reasons we prefer to make strong guarantees in cryptography:

1. That's often really what I need. If I'm downloading e.g. software updates over the network, I really need those to be authentic.

2. Even when I arguably don't need strong authenticity, like just reading some news articles, I want to use the same strong tools, because I don't want to have to study and understand (much less teach) the situations where some weaker tool fails. Inevitably I'll get that wrong or just forget, and I'll end up using the weak tool in some case where I should've used the strong one.

In this case, if I imagine teaching how commit signing works with a weak hash function, it sounds like "Signing commits means that no one can sneak malicious content into your repository, unless they first steal your secret signing key, or else you ever committed (or allowed anyone else to commit) a non-text file that they created." Actually writing that second part out makes it feel really bad to me.

2 comments

> "Signing commits means that no one can sneak malicious content into your repository

Signing commits does not mean that even when using cryptographically secure hash function. All it means is that you put your signature over a particular state of the repo (and, by extensions, its parent states). It has nothing to do with preventing "sneaking things in" - although it could be a (small) part of the whole set of measures taken to prevent someone from doing that.

> All it means is that you put your signature over a particular state of the repo (and, by extensions, its parent states).

That's technically true. Though in practice I think the implied social contract is that signing of a commit means you signal some kind of approval for the diff between the signed commit and its immediate predecessor(s).

I'm not 100% sure I understand your point, but it sounds like you're concerned about signing something using a weak hash function (i.e. where the hash of something is what actually gets signed)?

If that's the case, then my point is pretty simple: yes, SHA-1 is broken for signing untrusted input (due to weak collision resistance), but it is not broken (so far) for signing trusted input (due to strong preimage resistance).

My point earlier was primarily that the contents of a repository are generally trusted (via mechanisms like code review), and signing trusted content still works even with SHA-1.

Note that certificate signing vulnerabilities (which I assume is why TLS was mentioned?) usually rely on a malicious actor presenting one certificate and then presenting a different cert later; they can't arbitrarily fake existing certs from somebody else.

The analogous scenario for git repositories would be to have a malicious actor make a commit (or blob, tree, etc.) that could be swapped out for another. But if you already have malicious actors able to make commits in your repository, then the hash function doesn't matter: they can cause damage in many, many other ways.

> The analogous scenario for git repositories would be to have a malicious actor make a commit (or blob, tree, etc.) that could be swapped out for another. But if you already have malicious actors able to make commits in your repository, then the hash function doesn't matter: they can cause damage in many, many other ways.

The malicious actor can pose as a good-faith contributor and submit Pull Requests to your repository.

You review the code in the PR, and perhaps even prove it correct. Later on, the malicious actor can do the swapping trick. (Eg by running a mirroring service for your repository.)

> You review the code in the PR, and perhaps even prove it correct. Later on, the malicious actor can do the swapping trick. (Eg by running a mirroring service for your repository.)

Having a copy of code that is reviewable and then searching for a malicious collision is a preimage attack; extending two chosen prefixes (e.g. one "valid" and one "malicious") until they meet at a hash collision is how most practical (?) collision attacks work. The latter scenario produces large junk sections in the results, which should be obvious under even mild scrutiny.

If the reviewer misses the kilobytes of garbage in the middle of a file they're reviewing, then an attacker can just sneak malicious code in directly without requiring a hash collision.

If the project relies on an effectively unreviewable binary file that could hold kilobytes of junk (like some YAML files I've seen...), then that's already breaking the review process without requiring a hash collision.

Ignoring all of that, anybody grabbing code from an untrusted source is already vulnerable to whatever attacks that untrusted source wants to employ, with "exploiting hash collision" being one of the higher-effort attacks that could be mounted.

Essentially, any repository that would be vulnerable to any of the known hash collision attacks (via bad review, untrusted upstream, etc.) would be vulnerable to more mundane, easier attacks against the same weaknesses that do not depend on hash collisions.

> Having a copy of code that is reviewable and then searching for a malicious collision is a preimage attack;

No, it's not. You can sneak extra entropy into minor formatting choices or variable names etc, or exactly what you write in your commit messages. Or probably even ordering of files in your directories. (I don't think the git protocol enforces that files have to be in eg alphabetical order.)

> Ignoring all of that, anybody grabbing code from an untrusted source is already vulnerable to whatever attacks that untrusted source wants to employ, with "exploiting hash collision" being one of the higher-effort attacks that could be mounted.

I'm not sure. If your hash works fine, as long as someone trusted gives you the commit hash, anyone untrusted can give you the actual source.

And if you mean accepting PRs: accepting PRs from the untrusted internet basically how open source works..