Hacker News new | ask | show | jobs
by chuckadams 154 days ago
Breaking this down, several of AWS's core repos like the JS SDK use an allowlist of which contributor ids can run workflow actions in their PRs. The list was a regex, contained several short ids, and wasn't anchored with ^$, so if it allowed user 12345, then any userid containing 12345 could run their own actions on the PR, including one that exfiltrated access tokens. So they spammed GH with user creation requests, got an id that matched, and they were in like Flynn.

Said tokens didn't have admin access, but had enough privileges to invite other users to become full admins. Not sure if they were rotated, but github tokens are usually long-lived, like up to a year. Hey, isn't AWS the one always lecturing us to use temporary credentials? To be fair, AWS did more than just fix the regex, they introduced an "approve workflow run" UI unto the PR process that I think GH is also using now (not sure about that).

5 comments

> Said tokens didn't have admin access, but had enough privileges to invite other users to become full admins.

Ah... Github permissions. What fun.

Github actually has a way to federate with AWS for short-lived credentials, but then it screws everything up by completely half-assing the ghcr.io implementation. It's only available using the old deprecated classic access tokens.

Right? How is it that you still need a PAT or a custom app installation to access a registry?
Yeah wow! Even most "trusted" contributors shouldn't have this level of access. Is there really no way of scoping tokens with more granularity?
Nope. The best we could do was to create a separate service that creates Docker tokens (using "docker login") and exposes a secure API.

Obviously, GitHub needs to just fix this nonsense. But I interviewed a couple of "senior" engineers from GitHub, and I have zero hope of that happening soon.

As a security dude I spend way too much of my time fixing missing anchors or unescaped wildcards in regex. The good news is that it's trivial to detect with static analysis tooling. The bad news is that broken regex is often used for security checks.
Sometimes I wish regexes were full matches by default and required prefixing and postfixing with `.*` to get the current behaviour
Java's Pattern.match() method works that way. Python has two separate methods: re.match auto-anchors, re.search does not.
a match isn't boolean, it's substring. the original (and more common) use-cases would become excessively verbose
> The list was a regex ...

Regexpes for security allow lists: what could possibly every go wrong uh!?

At least the vuln was old enough so that they couldn't blame AI for it, otherwise the article would read different ;)
Ironically (?) an AI code review would very likely have noticed the overly-permissive regex.
This doesn't really matter as long as they also find 10x more nits that create noise for the human reviewer.
This is a good point. On my GH I’ve disabled Copilot reviews because the vast majority of them are false positives, but I’m reconsidering that position as it might still be worth it to wade through the spurious reviews just to catch some real issues.
I filter for false positives with language like this:

    For each bug you find, write a failing test. Run the test to make sure it fails. If it passes, try 1-3 times to fix the test. If you can't get it to work, delete the test and move on to the next bug.
It's not perfect, you still get some non-bugs where the test fails because it's premises are wrong. Eg, recently I tossed out some tests that were asserting they could index a list at `foo.len()` instead of `foo.len() - 1`. But I've found a bunch of bugs this way too.
I take it this wasn’t Lua then?

> I tossed out some tests that were asserting they could index a list at `foo.len()` instead of `foo.len() - 1`.

Nice, I’ll give this a try
Another success story for Regexes! Let's keep using this cryptic mess!
I met regexes when I was 13, I think. I spent a little time reading the Java API docs on the language's regex implementation and played with a couple of regex testing websites during an introductory programming class at that age. I've used them for the rest of my life without any difficulty. Strict (formal) regexes are extremely simple, and even when using crazy implementations that allow all kinds of backreferences and conditionals, 99.999% of regexes in the wild are extremely simple as well. And that's true in the example from TFA! There's nothing tricky or cryptic about this regex.

That said, what this regex wanted to be was obviously just a list. AWS should offer simpler abstractions (like lists) where they make sense.

> That said, what this regex wanted to be was obviously just a list. AWS should offer simpler abstractions (like lists) where they make sense.

Agree. I would understand if there was some obvious advantage here, but it doesn’t really seem like there is a dimension here where regex has an advantage over a list. It’s (1) harder to implement, (2) harder to review, (3) much harder to test comprehensively, (4) harder for users to use (correctly/safely).

Presumably the advantage was ease and speed of developing the filtering feature.

Wrong tradeoff, to be sure.