Hacker News new | ask | show | jobs
by CameronNemo 1705 days ago
I don't have a specific use case in mind. The possibilities are endless.

Delegating a cgroups doesn't take systemd, by the way. You can do it with mkdir().

Systemd puts user processes in a part of the cgroups tree where they can't work freely, by default. You have to use a systemd unit or the dbus api to break out of that box. That is a design decision on system's part. If I was putting user processes in cgroups, I would give them space to work amongst themselves.

1 comments

Well you have to understand that my mental picture from your past description is FreeBSD running in KVM on NixOS running on managed GCP/AWS, which is a somewhat confusing and convoluted architecture to me and I'm not sure what it's for or what the possibilities are that can't be done with some other setup. You could probably simplify and cut out some of those pieces. But if you meant something else, then let me know.

AFAIK you actually should not be using mkdir, because it is racy to have more than one process writing to the toplevel hierarchy. It's only safe to do that from the cgroup manager. That's what I've seen with all the existing implementations anyway.

"Systemd puts user processes in a part of the cgroups tree where they can't work freely, by default. You have to use a systemd unit or the dbus api to break out of that box. That is a design decision on system's part."

It's a design decision that was made because of cgroupsv2. You should really read systemd's documentation on this if you haven't, it describes in detail why this is. https://systemd.io/CGROUP_DELEGATION/

Edit: and also the kernel documentation on delegation. https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2...

In particular, any other container manager that wants delegation needs to do this same thing using a similar mechanism that you initiate by configuring it somehow. (It doesn't have to be a unit file or dbus call obviously.) Processes cannot just freely break out of their cgroup, that wouldn't be secure. Sorry if you know all this already, maybe a reader doesn't. Or maybe this is helpful to you if you want to add this to startup, I don't know.

"If I was putting user processes in cgroups, I would give them space to work amongst themselves. "

Ok now you lost me again... is this not exactly the purpose of cgroup delegation?

>it is racy to have more than one process writing to the toplevel hierarchy

Source? And define top level hierarchy? The top level isn't really special.

This is why locks exist btw.

I'm not convinced you have to manage cgroups from one process. That is not how most implementations do it.

>Processes cannot just freely break out of their cgroup, that wouldn't be secure.

Processes can freely migrate ("break out" if you insist on that terminology) if they have write access to one level above them in the tree. Systemd does not put you in that situation, but it could.

E.g. if you are in /user/<uid> you have nowhere to go, but if your are in /user/<uid>/default you can go to /user/<uid>/<container_runtime>/<unique_container_id>/default.

I'm sorry I don't understand what you're asking for specifically, the source is the cgroups API. Check the documentation I posted earlier, particularly this line:

"Because the resource control interface files in a given directory control the distribution of the parent’s resources, the delegatee shouldn’t be allowed to write to them."

You could add locking but that would be basically doing what systemd/docker/runc does and adding a new API on top of it which is then available through D-Bus or whatever. The top-level hierarchy is the top of any cgroup tree, if you have two processes writing to that without synchronization then they will potentially stomp over each other's values. You technically can go and run "sudo mkdir" inside your docker's toplevel cgroup but that would probably break things.

"if they have write access to one level above them in the tree. Systemd does not put you in that situation, but it could."

Well it does do that if you turn on delegation by giving you your own sub-tree, it doesn't do it by default because most services are not container managers and don't need an additional sub-tree. If you don't do that then write access is not allowed by the cgroups API, see above. It would not really make sense to allow a child process to say "I am going to take 100% of the cpu controller now and you get none sorry", that would defeat the purpose of cgroups.

What have you worked on in this space?
I used to consult for this kind of thing several years ago, I don't any more. The field has really just coalesced around K8s. I don't think there is much space for innovation anymore.