Hacker News new | ask | show | jobs
Solving Out of Memory Issues in Linux at Redpanda (redpanda.com)
47 points by northstar702 1443 days ago
5 comments

I can't wait for swap support in k8s (at the pod level).

I've got a bunch of burst-y workloads that not easy to predict, and when they're running at their peak, they're doing important stuff that I'd rather not be terminated. Over-provisioning is one way to handle it, but then I risk OOM-ing the entire node. Throwing more memory at it is another solution but then we're paying a ton of money to let memory sit around unused.

Is there any KEPs?
This article is a little confusing, so I just want to clarify something for the audience. It makes it sound like OOM killing is asynchronous, but it is not. The OOM killer kicks in as soon as you try to realize more memory than your cgroup's limit. The kernel will first attempt to reclaim memory and if that fails it will kill something. There isn't some grace period during which your cgroup can skate along over its limit.
Another area to consider is kernel memory accounting in the cgroup. So Kernel memory for sockets and the like, can get counted for in the cgroup / kubernetes pod. So this is another area where you shouldn't give 100% of the memory to the application if it needs to communicate or is busy on the network.
It's also possible to boot with kmem accounting disabled, and I recommend it. Yes, it makes the accounting approximate, but kmem accounting is fundamentally unfair. Random cgroups get victimized by owning random slabs, and kernel reclaim is a mess of bugs.
This is exactly the reason for the default 10% buffer between what we tell seastar it can have, and what we request for the cgroup with K8s.

In one case we saw that the kernel was unable to allocate a TCP buffer, so it decided to OOMKill Redpanda.

Redpanda is just amazing. it's definitely a sleeper project..but I love having it as a secret weapon.
say more? In what ways is it amazing
Tl;Dr, set memory constraints for you k8s containers, kubernetes-101.
True, it is harder when you need to maximize resource utilization. The k8s scheduler did want we requested, but seastar and memory allocation in Redpanda show us (OOM) that POD sandbox has some overhead.