Hacker News new | ask | show | jobs
by randomswede 1665 days ago
Anecdotally, the main slowdown we saw of Go code running in Kubernetes at my previous job was not "GC stalls", but "CFS throttling". By default[1], the runtime will set GOMACSPROCS to the number of cores on the machine, not the CPU allocation for the cgroup that the container runs in. When you hand out 1 core, on a 96-core machine, bad things happen. Well, you end up with a non-smooth progress. Setting GOMACPROCS to ceil(cpu allocation) alleviated a LOT of problems

Similar problems with certain versions of Java and C#[1]. Java was exacerbated by a tendency for Java to make everything wake up in certain situations, so you could get to a point where the runtime was dominated by CFS throttling, with occasional work being done.

I did some experiments with a roughly 100 Hz increment of a prometheus counter metric, and with a GOMAXPROCS of 1, the rate was steady at ~100 Hz down to a CPU allocation of about 520 millicores, then dropping off (~80 Hz down to about 410 millicores, ~60 hz down to about 305 millicores, then I stopped doing test runs).

[1] This MAY have changed, this was a while and multiple versions of the compiler/runtime ago. I know that C# had a runtime release sometime in 2020 that should've improved things and I think Java now also does the right thing when in a cgroup.

2 comments

AFAIK, it hasn't changed, this exact situation with cgroups is still something I have to tell fellow developers about. Some of them have started using [automaxprocs] to automatically detect and set.

[automaxprocs]: https://github.com/uber-go/automaxprocs

Ah, note, said program also had one goroutine trying the stupidest-possible way of finidng primes in one goroutine (then not actyakly doing anything with the found primes, apart from appending them to a slice). It literally trial-divided (well, modded) all numbers between 2 and isqrt(n) to see if it was a multiple. Not designed to be clever, explicitly designed to suck about one core.