Guide to OOMKill Alerting in Kubernetes Clusters

Y	Hacker News new \| ask \| show \| jobs

	Guide to OOMKill Alerting in Kubernetes Clusters (netice9.com)
	65 points by draganm 2038 days ago

8 comments

ec109685 2038 days ago

Another hidden issue is that as a container gets close to running out of memory, it furiously drops read only pages from memory, only to need to read some of them back into memory moments later.

This pathological swapping behavior can impact other workloads on the system.

cgroups2 has better protections against this behavior.

link

pflanze 2038 days ago

How can you use cgroups2 for this? I know that BSD resource limits are basically useless for this as they only allow to limit virtual memory, not RSS use.

EarlyOOM [1] is a configurable daemon that kills processes early enough to (hopefully) prevent thrashing. I'm using it on my Linux desktops (it has proven to catch my own programs' runaway memory usage before it risks locking up the development machine), but it may also be useful on servers. It logs to syslog but also can be configured to run a program on kill events.

[1] https://github.com/rfjakob/earlyoom, https://launchpad.net/ubuntu/+source/earlyoom, https://packages.debian.org/search?keywords=earlyoom

(Why would a user space OOM killer be necessary if the kernel has better information about the state of the world? I don't know the details, but my interpretation is that because people disliked OOM killing, the kernel devs made the kernel OOM killer trigger so late that it is largely useless. If that's true and thus a social problem, maybe it needs to be solved on that level, too.)

BTW in my experience, Linux 2.2 used to handle out of memory situations much more gracefully than any later kernel version.

link

ec109685 2035 days ago

memory.min will ensure it doesn't try to reclaim memory once it's a lost cause: https://lwn.net/Articles/752423/

link

throwii 2038 days ago

Is there an issue on cgroups2 adoption for Kubernetes somewhere?

link

The_rationalist 2038 days ago

https://github.com/kubernetes/enhancements/blob/master/keps/...

link

throwii 2038 days ago

Thank you!

link

segmondy 2038 days ago

I think the issue is that your nodes have swaps. Why will you have swap on container nodes? IMO, the idea with container management is to get predictability with resources. If you have 8gb on a node, you know that the containers get 8gb. You might not be able to tell exactly how based on how it's configured, but you know once they collectively use 8gb, that's it. Swap is going to mess up things really bad in ways you can't even predict.

link

johncolanduoni 2038 days ago

Even without swap enabled or any explicit memory mapping, read only pages from executables (code, read-only data) are mapped into the process’ address space and may be evicted. Unless you explicitly lock those into RAM they still behave somewhat like swapped memory does, except the pages don’t need to be written back.

link

bpaliz 2038 days ago

At last year's KubeCon someone presented https://github.com/opsgenie/kubernetes-event-exporter

link

hagmonk 2038 days ago

Not going to help you in this case; Kubernetes does not log an OOMKilled event. Issue tracking this has been open for some time: https://github.com/kubernetes/kubernetes/issues/69676

Tracking OOMKilled counts via the kubelet also had an open issue which was closed without being fixed: https://github.com/kubernetes/kubernetes/pull/87856

link

linsomniac 2038 days ago

I have an Icinga2 monitor on all my hosts for "dmesg" output to include the OOM killer string, and alert on that. Very useful.

link

dilatedmind 2038 days ago

would it have been sufficient to alert on high memory usage? It might be reasonable to set an alert on say 70% rss. As long as the pod does not pass this threshold and die before a metric can be sampled.

that "no such file or directory" looks to be coming from building a dynamic executable on debian and trying to run it on alpine.

link

draganm 2038 days ago

as for the first question - that wouldn't be enough. AFAIK mmap-ed pages are part of RSS and it's quite usual for them to use up everything up to the memory limit (databases kind of rely on this 'feature'). None of that would provoke an OOMKill.

for the second comment - I've used images the author has published on Docker hub. Maybe there would've been a way to make it work, but if you take a look at the amount of code in missing-container-metrics, you will realise that I've used less time to write that than I would've spent debugging someone else's Docker build and golang code that is not really maintained.

link

linsomniac 2038 days ago

I mean that's fine if you're ok with 30% wasted memory... We just recently had to tune some JVM and monitoring settings because we do the initial and max heap allocation to around 90% memory. There's very little else going on.

link

jeppesen-io 2037 days ago

Thank you so much!!!! I've been trying to find something like this

link

m1keil 2038 days ago

What about healthchecks?

link

zimbatm 2038 days ago

They are complementary.

If a sub-process gets OOMKilled and the container doesn't die, then it's most likely that the parent process didn't handle that scenario. In which case the health-check wouldn't cover that issue.

link

jacques_chester 2038 days ago

I don't think it would work to report an OOMkill. The process that would answer the healthcheck probe would be gone.

link

m1keil 2038 days ago

It's not clear to me what happens after the OOM. Does the init process restarts the daemon? I would argue that it shouldn't.

If the process stops responding to a healthcheck, it's the scheduler's responsibility (k8s in this case) to handle it. Crashes in this case should be handled in a similar way, whether it's due to OOM or a bug.

Maybe I'm missing another scenario here?

link

dharmab 2037 days ago

> Does the init process restarts the daemon?

In systemd, this depends on what the Restart option in the service unit is set to. The default is to not restart.

https://www.freedesktop.org/software/systemd/man/systemd.ser...

link

geoffbp 2038 days ago

This fails to load for me

link

uturingmachine 2038 days ago

Probably ran out of memory.

link

yyyk 2038 days ago

https://archive.is/DDHQT

link

draganm 2038 days ago

Sorry for that - seems I've under-estimated possible traffic. Scaled up the server a bit now.

link

hendry 2038 days ago

Scaled up a blog... lol

link

draganm 2038 days ago

When the blog is on a $5/month DO droplet and it does its own TLS termination that was necessary.

link

hendry 2038 days ago

Use serverless guys.

k8s is a train wreck of needless complexity for 99% of developers.

link

outworlder 2038 days ago

The hate on K8s is misplaced in this case. You should redirect it to containers. Other container orchestration mechanisms will encounter similar, if not the same, issues.

link

justincormack 2038 days ago

Running out of memory vs wasting memory is also an issue on serverless although somewhat less extreme. Part of the issue is no one knows how much memory their code might use, and we don’t have frameworks that adapt to available memory much. Resource constrained computing is hard.

link

gautamdivgi 2038 days ago

k8s unfortunately is the only way to maintain sanity if you want to maintain a multi-cloud environment. I don't relish the idea of duplicating functionality but maintaining code across aws lambda's and azure functions.

link

chokeartist 2038 days ago

While I acknowledge you probably have solved for your use-case... I can't help but hardcore LOL at your somewhat terse perspective! Dude... K8S just got mature!

link