Hacker News new | ask | show | jobs
by NewJazz 1705 days ago
That is not true at all about cgroup2. Also think outside the box. Not everyone is using cgroups and namespaces. Some people are out there using gvisor, or KVM, or FreeBSD jails.
1 comments

I'm not sure what you mean it's not true, AFAIK that constraint was the main issue with getting Docker moved over to cgroupsv2. (Edit: some background here https://github.com/opencontainers/runc/pull/2113) It's fixed now though so everything should work fine with systemd. If you aren't using cgroups and namespaces then you probably don't get much benefit from running a system like NixOS on bare metal either, so I'm having trouble figuring out what your use case would be. Any other immutable Linux setup will do, it might even be less hassle.

Those other things you mention are confusing to me, gvisor and KVM are mostly orthogonal to container management. And FreeBSD jails don't work on Linux.

Nix works on more than just linux.

Google Cloud uses gvisor for their K8s offering; AWS and Fly.io use firecracker for their container offerings.

Cgroups in v2 can be delegated easily and cleanly. As well as namespaced. Systemd or no systemd. Systemd just makes your life harder if you want to do rootless containers without integrating with them.

"Nix works on more than just linux."

IIRC the GP comment was asking about NixOS, not Nix. If you have everything already going through managed K8s or firecracker then I don't understand what you are using NixOS for. You could just install Nix on some other distribution that uses whatever init/container setup you want.

"Cgroups in v2 can be delegated easily and cleanly. As well as namespaced."

Right but none of those other things that were mentioned support cgroups delegation at an OS service level, only systemd does. Unless they have added this recently and I missed it. And if you're just using this to run a hypervisor then you're bypassing all that completely.

I don't see what you mean systemd makes it harder, you have to do basically the same process in any container manager if you want delegation. This is part of the design of cgroupsv2, it's not something systemd came up with. I'm sorry if I'm asking stupid questions but I honestly am really confused what your use case is, and your explanations are just making me more confused, so maybe something got lost here.

I don't have a specific use case in mind. The possibilities are endless.

Delegating a cgroups doesn't take systemd, by the way. You can do it with mkdir().

Systemd puts user processes in a part of the cgroups tree where they can't work freely, by default. You have to use a systemd unit or the dbus api to break out of that box. That is a design decision on system's part. If I was putting user processes in cgroups, I would give them space to work amongst themselves.

Well you have to understand that my mental picture from your past description is FreeBSD running in KVM on NixOS running on managed GCP/AWS, which is a somewhat confusing and convoluted architecture to me and I'm not sure what it's for or what the possibilities are that can't be done with some other setup. You could probably simplify and cut out some of those pieces. But if you meant something else, then let me know.

AFAIK you actually should not be using mkdir, because it is racy to have more than one process writing to the toplevel hierarchy. It's only safe to do that from the cgroup manager. That's what I've seen with all the existing implementations anyway.

"Systemd puts user processes in a part of the cgroups tree where they can't work freely, by default. You have to use a systemd unit or the dbus api to break out of that box. That is a design decision on system's part."

It's a design decision that was made because of cgroupsv2. You should really read systemd's documentation on this if you haven't, it describes in detail why this is. https://systemd.io/CGROUP_DELEGATION/

Edit: and also the kernel documentation on delegation. https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2...

In particular, any other container manager that wants delegation needs to do this same thing using a similar mechanism that you initiate by configuring it somehow. (It doesn't have to be a unit file or dbus call obviously.) Processes cannot just freely break out of their cgroup, that wouldn't be secure. Sorry if you know all this already, maybe a reader doesn't. Or maybe this is helpful to you if you want to add this to startup, I don't know.

"If I was putting user processes in cgroups, I would give them space to work amongst themselves. "

Ok now you lost me again... is this not exactly the purpose of cgroup delegation?

>it is racy to have more than one process writing to the toplevel hierarchy

Source? And define top level hierarchy? The top level isn't really special.

This is why locks exist btw.

I'm not convinced you have to manage cgroups from one process. That is not how most implementations do it.

>Processes cannot just freely break out of their cgroup, that wouldn't be secure.

Processes can freely migrate ("break out" if you insist on that terminology) if they have write access to one level above them in the tree. Systemd does not put you in that situation, but it could.

E.g. if you are in /user/<uid> you have nowhere to go, but if your are in /user/<uid>/default you can go to /user/<uid>/<container_runtime>/<unique_container_id>/default.