Hacker News new | ask | show | jobs
by CameronNemo 1705 days ago
>it is racy to have more than one process writing to the toplevel hierarchy

Source? And define top level hierarchy? The top level isn't really special.

This is why locks exist btw.

I'm not convinced you have to manage cgroups from one process. That is not how most implementations do it.

>Processes cannot just freely break out of their cgroup, that wouldn't be secure.

Processes can freely migrate ("break out" if you insist on that terminology) if they have write access to one level above them in the tree. Systemd does not put you in that situation, but it could.

E.g. if you are in /user/<uid> you have nowhere to go, but if your are in /user/<uid>/default you can go to /user/<uid>/<container_runtime>/<unique_container_id>/default.

1 comments

I'm sorry I don't understand what you're asking for specifically, the source is the cgroups API. Check the documentation I posted earlier, particularly this line:

"Because the resource control interface files in a given directory control the distribution of the parent’s resources, the delegatee shouldn’t be allowed to write to them."

You could add locking but that would be basically doing what systemd/docker/runc does and adding a new API on top of it which is then available through D-Bus or whatever. The top-level hierarchy is the top of any cgroup tree, if you have two processes writing to that without synchronization then they will potentially stomp over each other's values. You technically can go and run "sudo mkdir" inside your docker's toplevel cgroup but that would probably break things.

"if they have write access to one level above them in the tree. Systemd does not put you in that situation, but it could."

Well it does do that if you turn on delegation by giving you your own sub-tree, it doesn't do it by default because most services are not container managers and don't need an additional sub-tree. If you don't do that then write access is not allowed by the cgroups API, see above. It would not really make sense to allow a child process to say "I am going to take 100% of the cpu controller now and you get none sorry", that would defeat the purpose of cgroups.

What have you worked on in this space?
I used to consult for this kind of thing several years ago, I don't any more. The field has really just coalesced around K8s. I don't think there is much space for innovation anymore.