Solving Out of Memory Issues in Linux at Redpanda

Y	Hacker News new \| ask \| show \| jobs

	Solving Out of Memory Issues in Linux at Redpanda (redpanda.com)
	47 points by northstar702 1443 days ago

5 comments

kube-system 1443 days ago

I can't wait for swap support in k8s (at the pod level).

I've got a bunch of burst-y workloads that not easy to predict, and when they're running at their peak, they're doing important stuff that I'd rather not be terminated. Over-provisioning is one way to handle it, but then I risk OOM-ing the entire node. Throwing more memory at it is another solution but then we're paying a ton of money to let memory sit around unused.

link

RafalKorepta 1443 days ago

Is there any KEPs?

link

kube-system 1443 days ago

https://github.com/kubernetes/enhancements/issues/2400

link

jeffbee 1443 days ago

This article is a little confusing, so I just want to clarify something for the audience. It makes it sound like OOM killing is asynchronous, but it is not. The OOM killer kicks in as soon as you try to realize more memory than your cgroup's limit. The kernel will first attempt to reclaim memory and if that fails it will kill something. There isn't some grace period during which your cgroup can skate along over its limit.

link

kevin_nisbet 1443 days ago

Another area to consider is kernel memory accounting in the cgroup. So Kernel memory for sockets and the like, can get counted for in the cgroup / kubernetes pod. So this is another area where you shouldn't give 100% of the memory to the application if it needs to communicate or is busy on the network.

link

jeffbee 1443 days ago

It's also possible to boot with kmem accounting disabled, and I recommend it. Yes, it makes the accounting approximate, but kmem accounting is fundamentally unfair. Random cgroups get victimized by owning random slabs, and kernel reclaim is a mess of bugs.

link

benpope 1443 days ago

This is exactly the reason for the default 10% buffer between what we tell seastar it can have, and what we request for the cgroup with K8s.

In one case we saw that the kernel was unable to allocate a TCP buffer, so it decided to OOMKill Redpanda.

link

nojito 1443 days ago

Redpanda is just amazing. it's definitely a sleeper project..but I love having it as a secret weapon.

link

mi_lk 1443 days ago

say more? In what ways is it amazing

link

somenewaccount1 1443 days ago

Tl;Dr, set memory constraints for you k8s containers, kubernetes-101.

link

RafalKorepta 1443 days ago

True, it is harder when you need to maximize resource utilization. The k8s scheduler did want we requested, but seastar and memory allocation in Redpanda show us (OOM) that POD sandbox has some overhead.

link