Hacker News new | ask | show | jobs
by puzzle 2928 days ago
Were the system pods using all that memory or just reserving it? It's not straightforward to scale them, because the node might run just your tiny Rust server or 20 high traffic web apps. You don't want the log agent to keel over just because of the latter. GKE and many other Kubernetes deployments use something called addon-resizer to determine CPU and RAM given to cluster services. The problem is that, typically, it scales based on node count and the settings are usually conservative on the lower end, i.e. your case of a single node. I think it assumes clusters are all at least 10/15 nodes. On a test cluster, I see the metrics server using only 16MB of RAM, but it requests 104MB. Ironically, the autoscaling nanny in the same pod uses another 8MB.

This is a known issue that is not easy to solve in the general case. I think Tim Hockin ran a conversation about how to autoscale on the very low end at last year's Kubecon, with people like you in mind. The other use case he brought up is how to set up services in a Minikube cluster that might be running in a 2GB VM.

1 comments

> Were the system pods using all that memory or just reserving it?

700MB was the sum of all the requested minimum RAM for all those service pods. So yeah, you're probably right that they're ceilings of sorts. Still it's a bit crazy to see a logging service, who's job is merely to haul logs off to a different server, requesting 200MB.

I'm also bewildered by Container Optimized OS's memory consumption. IIRC it was 500MB+ bare; doing nothing. As reported by top. I forget which, but I stood up either Debian Stretch or Ubuntu 18.04 and it was only ~200MB with Docker installed.

The logging service numbers are easily explained: the remote server might have transient failures, so the forwarder will cache stuff in memory (the alternative is to just stop reading the logs, but then you risk losing messages if the pod dies in the meantime). You complained about Go, but it doesn't help that the fluentd agent is written in Ruby. There's a new Go rewrite of it, but I don't think GKE or others use it yet.

Was COS a standalone GCE instance or GKE? In the latter case, memory will be used by the usual suspects: fluentd, kubelet, kube-proxy, docker, node-problem-detector. For both, there are also a few Google daemons in Python (ugh).