Hacker News new | ask | show | jobs
by Ciantic 43 days ago
What I want to focus on is mental model of your CI pipeline, and problem with too much YAML, consider this quote:

> Cache scope is per-repo, shared across pull_request_target runs (which use the base repo's cache scope) and pushes to main. A PR running in the base repo's cache scope can poison entries that production workflows on main will later restore.

This is very difficult to understand, and teach to new people, because everything is configured as YAML, yet everything is layed out in the background to directories and files.

What if your CI pipeline was old-school bash script instead? This would be far more obvious to greater amount of people how it works, and what is left behind by other runs. We know how directories and files work in bash scripts.

Could we go back to basics and manage pipelines as scripts and maybe even run small server?

11 comments

> What if your CI pipeline was old-school bash script instead?

It doesn't matter if the cache is accessed through `actions/cache` in YAML or `curl -X POST $GITHUB_CACHE_URL < wololo.exe` or whatever. The fundamental problem is that "cache scope is per-repo."

I cannot fathom why they chose to support this at all, let alone make it the default for any action trigger. Any writeable data should be scoped to users/groups and require credentials. It should be impossible to write to a shared cache without explicitly granting permissions to the user triggering the action.

And sure a PAT might leak through an env var not configured through secrets, but that's an understandable issue created by the user. I think most people are surprised their caches are world writeable with an innocuous actions trigger.

If I saw this in my CI script:

    curl -X POST $GITHUB_CACHE_URL < wololo.exe
It would make me pause, but now that it is a misfeature in YAML configuration it is more widely used. Point of bash scripts they are auditable, and understandable.

I didn't prescribe what the bash script would be, because it would differ on use case. If I wanted to share artifacts from other runs I would probably use podman and make sure I start new runs from known good condition, but because I understand that. Some other would use nix or whatever else.

The fundamental problem is that on Github Actions it's possible to give read-only permissions to pipelines that are then violated because runners can be granted read+write permissions to the cache. And they don't consider this a P0 bug.

So you don't even need to see questionable bash scripts to know there's a problem. The script would have already completed and pwned you by the time you see it.

With podman or nix you would have to poison the container registry/nix store which is more difficult, but you're also probably using your own runners.

My point though is that it's not bash or yaml here, but Github's default access controls. If you own your own runners and your own caching layer then you're not going to be nearly as boneheaded as Github here. But Github pushes people towards their integrated solutions, which have horrible defaults.

This is a problem with all of devops imo - everything is a magic yaml config file and they're very difficult to debug or reason about unless you _just know things_.
Because most modern development practices assume you work at a trillion dollar corporation and can subsidize very poor unscalable business practices. It's baffling, especially when modern solutions are worse at making maintainable software not better IMO.
Agree. It's unfortunate that people new to development are encouraged to embrace practices that large teams in big companies have had to adopt. It might make sense for career development, but it makes for a miserable development experience, especially for someone new to it, wanting to build something for themselves. No joy in it.
The other advantage with bash is that most developers can run it locally to validate what it is doing and debug issues. With GitHub Actions you need to always commit and push, slowing down the DX.
Shameless plug: solving this "push and pray" problem is something we have been focusing on with Dagger. It's an open-source CI platform that decouples the runtime from the triggers. The runtime is open source and local-first, so you develop the actual logic of your pipelines with a proper dev loop. Then, you separately wire up your git triggers. The same pipeline logic can be triggered locally or from git events.

IMO this is the only clean way to solve the problem. If you want to check it out and share feedback: https://dagger.io . We also have a very active Discord server full of CI nerds.

You should add an Easter egg in your cli program: dagger attack, which prints out a favourite Top Gun quote.
Yep, I tried to use Act to get a sense of what our YAML was doing but it failed to pull the docker images and I gave up - not enough incentive to test locally when I can push to GH and yolo it and hope the ops folks can help me figure it out
Commit, Push, & Pray.
Fully agree. I was very confused trying to understand the attack.

There are so many things involved that a casual user will never get security right. Even if you are knowledgeable it's very draining if you have to catch up, securing all your workflows is hard work that is definitely NOT done at a glimpse and you probably postpone it because of that.

If you have some sense for security you will usually get nervous doing something stupid in a bash script. Well, except you bury everything in thousands of abstractions.

This isn't true.

Our old jenkins hosts were largely forever instances with forever credentials that were just waiting to take down the org.

Modern pipelines are orchestrates that run ephemeral execution environments with ephemeral credentials that can significantly decrease the impact and timescales of getting pwned.

They're not perfect, but you can get pretty good posture by applying expertise to the subject. The problem, like always, is this expertise is neither valued nor rewarded.

Not sure cases like the cache poisoning here would be more obvious.

Unless your bash script setup doesn't have the functionality of pull_request_target, but then removing it also works.

Without caching, you build and rebuild and rebuild the same things over and over.

With caching, a less strictly managed stage, such as a routine PR build, can affect high-stakes stages such as production builds. But if your builds are reproducible, and dependencies pinned, it should not make a difference.

Old-school scripts have the disadvantage of being slower than a more nice declarative approach, be it makefiles or docker build. Both provide a way to trace the build process in detail; observability is key.

The core issue is that the lang is horrible to get to compile in a reasonable amount of time on a build server. Then since the way it is designed it is bad at caching. That is why you have this "optimistic" caching to begin with.

Our solution is to build everything in Docker. Which is about what you suggest since it does not automatically share cache between branches. But it is slow.

I like a lot about nix, and this is one of those things: built derivations are addressed by the hash of their inputs: without changing something about the inputs, you (barring bugs) cannot get an incorrect or poisoned cache artifact
That is not done because then it would be slow.

I don’t think that’s a very strong argument, but that’s the rationale for not having simpler, no-state-shared-between-runs pipelines everywhere I have worked.