Hacker News new | ask | show | jobs
by eru 1267 days ago
> The analogous scenario for git repositories would be to have a malicious actor make a commit (or blob, tree, etc.) that could be swapped out for another. But if you already have malicious actors able to make commits in your repository, then the hash function doesn't matter: they can cause damage in many, many other ways.

The malicious actor can pose as a good-faith contributor and submit Pull Requests to your repository.

You review the code in the PR, and perhaps even prove it correct. Later on, the malicious actor can do the swapping trick. (Eg by running a mirroring service for your repository.)

1 comments

> You review the code in the PR, and perhaps even prove it correct. Later on, the malicious actor can do the swapping trick. (Eg by running a mirroring service for your repository.)

Having a copy of code that is reviewable and then searching for a malicious collision is a preimage attack; extending two chosen prefixes (e.g. one "valid" and one "malicious") until they meet at a hash collision is how most practical (?) collision attacks work. The latter scenario produces large junk sections in the results, which should be obvious under even mild scrutiny.

If the reviewer misses the kilobytes of garbage in the middle of a file they're reviewing, then an attacker can just sneak malicious code in directly without requiring a hash collision.

If the project relies on an effectively unreviewable binary file that could hold kilobytes of junk (like some YAML files I've seen...), then that's already breaking the review process without requiring a hash collision.

Ignoring all of that, anybody grabbing code from an untrusted source is already vulnerable to whatever attacks that untrusted source wants to employ, with "exploiting hash collision" being one of the higher-effort attacks that could be mounted.

Essentially, any repository that would be vulnerable to any of the known hash collision attacks (via bad review, untrusted upstream, etc.) would be vulnerable to more mundane, easier attacks against the same weaknesses that do not depend on hash collisions.

> Having a copy of code that is reviewable and then searching for a malicious collision is a preimage attack;

No, it's not. You can sneak extra entropy into minor formatting choices or variable names etc, or exactly what you write in your commit messages. Or probably even ordering of files in your directories. (I don't think the git protocol enforces that files have to be in eg alphabetical order.)

> Ignoring all of that, anybody grabbing code from an untrusted source is already vulnerable to whatever attacks that untrusted source wants to employ, with "exploiting hash collision" being one of the higher-effort attacks that could be mounted.

I'm not sure. If your hash works fine, as long as someone trusted gives you the commit hash, anyone untrusted can give you the actual source.

And if you mean accepting PRs: accepting PRs from the untrusted internet basically how open source works..