Hacker News new | ask | show | jobs
by kevin_nisbet 1443 days ago
Another area to consider is kernel memory accounting in the cgroup. So Kernel memory for sockets and the like, can get counted for in the cgroup / kubernetes pod. So this is another area where you shouldn't give 100% of the memory to the application if it needs to communicate or is busy on the network.
2 comments

It's also possible to boot with kmem accounting disabled, and I recommend it. Yes, it makes the accounting approximate, but kmem accounting is fundamentally unfair. Random cgroups get victimized by owning random slabs, and kernel reclaim is a mess of bugs.
This is exactly the reason for the default 10% buffer between what we tell seastar it can have, and what we request for the cgroup with K8s.

In one case we saw that the kernel was unable to allocate a TCP buffer, so it decided to OOMKill Redpanda.