Hacker News new | ask | show | jobs
by TrueDuality 819 days ago
I don't need to prove that I can do a thing to prove that a thing is possible and the burden of proof is on you claiming that this isn't an active security problem because that's basically well known and well understood. The only outstanding questions is how-detectable, impactful, and available those attacks are.

Specifically the things you need to counter is at least one of the thing in the following list:

* Hash security: SHA1 collisions are feasible to generate and companies are actively moving away from them with good reason and have been doing so for at least seven years (https://security.googleblog.com/2017/02/announcing-first-sha..., https://www.howtogeek.com/238705/what-is-sha-1-and-why-will-...)

* Content generation: As I've already discussed, the contents of what you use to make that collision can be anything you want and meet any requirements you have the ability to produce a generator for. To meet this you're going to have to prove to me that no engineer can make a seeded random number that uses a language's grammar to produce plausible and valid to compile token, or to just use a language model to produce plausible code and comments (also requiring a seed). This is a _trivial_ thing to do.

* The attack: Git relies on a chain-of-hashes based on SHA1, those hashes are over the complete files included in the repository if you can generate a collision for a file in git's history you can replace the files in that commit and all subsequent commits will remain valid. This is the attack everyone is worried about related to git. The only thing that protects against this right now is the security of SHA1. Additionally signatures on commits and tags DO NOT protect against this, they're over the hash, commit message, and list of objects not the objects themselves. The attacked files will still look like they came from a valid signed commit.

The extra scary part of that attack is the malicious/changed file will not be visible to any existing checkouts, those clients will believe they have the correct object and will continue to show that correct object. But anything that does regular fresh checkouts, like say a CI system that deploys to prod, will get the poisoned object. Even if its checking the signatures on every commit, it won't see this coming.

So the security of all our git repos, our production environments, new devs are foundationally rooted in the security of either write access to the repository OR the foundational security of SHA1.

I would say that is a practical and useful attack. A faster hashing algorithm will EXACERBATE this problem as you're almost always trading collision resistance for speed. Any hashing algorithm that allows you to calculate its hashes faster is MORE vulnerable to collision attacks not less.

"Computationally astronomical" isn't a very good argument. 20 years ago SHA1 was insane in its security. These thing get weaker over time and need to be periodically replaced, not because they're failing, but because increased resource capacity has fundamentally changed the original assumptions the algorithm was designed for.

Even with the computationally astronomical argument that is a matter of cost and resources, not practicality. It absolutely is practical to do if the result is worth the outcome. What is the most famous git based project? Maybe the thing it was originally designed to manage... Think maybe _any_ nation state would be happy to pay less than ~$100k USD (https://sha-mbles.github.io/) to get some malicious code running in production builds of the Linux kernel? The kernel project specifically has extra manual checks and multiple "known good repos" with commits literally being added by hand to protect against this attack. It's practical, it's a problem. It needs to be fixed.

If you still insist on a working example pay me $125k and I'll produce one for you.

1 comments

If someone can change a committed file inside a git repository , the main problem is that your system is FUBAR. Let's say I'm the attacker and I'm inside I can change committed files and I can generate a collision for each. If my goal is to deface the repository I can insert file with gibberish, i.e. I have a file with source code:

... omissis ...

ptr=calloc(SIZE, sizeof(long));

... etc ...

then I have :

aDjw'pfojqe'rf[24oijgfpoemgl;m,g02ir-9u13]9fu24[efgje2ioprn

Same sha1 hash.

But wait, why should waste 1000 GPU to deface a Git repository when I can simply delete it. I can change the files, I can delete it. It's simply stupid.

An attack with a sense is to change this:

ptr=calloc(SIZE, sizeof(long));

inserting:

ptr=calloc(SIZE-10, sizeof(long));

Now I have a BOF, same hash, only a code review can find the fraudulent change.

This is beyond "I make a collision inserting commented gibberish" , like this:

// adojwqf'pjqeworivhneq;lnvl;dqjnfvljeqrvneljvn

You have to insert a change that works and implement an attack making it invisible.

Good luck with that. I also read in some comments some AI nonsense I find Star Trek bullshit.

> If you still insist on a working example pay me $125k and I'll produce one for you

Even with 100M$ budget, you can't.

But why I even want to do that ? I have access, I can replace the whole repo with one full of exploitable bugs !

So the initial question: "If I change sha-1 in Git with some newer version, is that a security improvement?" . I feel the the answer is "NO".

Defacing git repositories doesn't even make sense. You won't mess with people's checkouts, its trivial to detect and identify the responsible party. It's the security equivalent of a child throwing a tantrum in their own room. You want to replace it? Everyone that comes after you and tries to push a change will immediately notice like opening the door to the proverbial child's room. You're busted and you've accomplished nothing.

You want to inject malicious code yourself? When it gets caught, or the file is inspected or reviewed you're busted.

This attack has the opportunity to get malicious code injected into a repository that will never show up in a PR, code review, or any existing checkout (so the senior developers that would notice the change most likely will never receive it). This is re-using an existing trusted and known good commit in your history, even the signature on it, to say "yeah this has always been here, this is perfectly safe and hasn't been modified since the author wrote it".

This is far more subtle, sneaky, and extremely valuable as an attack vector (and it gets more juicy, stay tuned) to get targeted vulnerabilities and backdoors into specific software. This isn't a novel attack method, as I mentioned the Linux kernel goes through a very rigorous process just to avoid this kind of attack.

Aside: You keep trying to use gibberish for your bad examples. You don't need to use garbage, that's the point I keep trying to hammer home to you. The added details can be from any generator and isn't constrained to living exclusively in comments. Garbage is what people use as examples for these attacks because its the easiest, and if you can demonstrate it for garbage then it works for any generator. With garbage you've made the point.

Back at the security issue. So now you have a poisoned repo that contains malicious code and is effectively undetectable through normal use. Meanwhile your production artifacts include the unaltered malicious code from the repository. It will remain unchanged and referenced until someone else creates _any_ change to the file you targeted (as once again git doesn't actually store diffs but whole files in a particular commit). That change might be something like a developer adding some print statements to try and diagnose why the CI system is failing.

When another change happens for that file the evidence mostly vanishes or at least is extremely obscured. There will be _some_ object in your repository that has the SHA-1 object, whether its the original or the malicious one depends entirely on when your checkout occurred.

On the receiving end your best case scenario is that the changed code doesn't work and causes weird bugs in your CI system that can't be reproduced in local checkouts and goes away magically as soon as anyone tries to diagnose it. This capability is worth STUPID amounts of money and I would be shocked if this isn't a technique used selectively in the wild by nation states.

SO how do you solve this problem?

* One of the inherent problems is that signatures don't actually cover the content of the commit. This is another regular complaint of git's behavior and would allow you to side-step this issue using the existing signing infrastructure. This is a bandaid but it's what most people argue for as it is significantly less of a lift than changing the hash function. If you're worried about the attack you just have to sign your commits and tags. If you sign your commits NOW without a change to git, you're still 100% vulnerable to this attack. and because the signatures will still be valid is likely to either make someone innocent look guilty of injecting a vulnerability, or will have audits look less closely at the code because it came from a trusted source causing more harm than good.

* Change out SHA-1 to something that isn't as vulnerable to collision attacks. The problem is collision attacks. Let me say that again, the core issue is with collision attacks. If you can create a chosen plaintext or chosen prefix attack the security guarantees of the git ledger goes away. You can't trust it. It needs to be replaced.

* If neither of those are options for you, your third option of protection is to adopt the Linux kernel policies. Releases are done directly from engineer's machines from a trusted known good repo that has patches added by hand by the most senior engineer.