Hacker News new | ask | show | jobs
by lukax 1 hour ago
NUMA can cause really crappy performance. We deployed a Go based LLM gateway in Kubernetes deployed on a server with hundreds of CPU cores. We didn't explicitly set GOMAXPROCS so Go runtime scheduled goroutines over different CPUs and it constantly used 200% CPU and GC was causing latency spikes. Then we set GOMAXPROCS 8 and all performance issues went away. Until recently Kubernetes didn't work well with NUMA.
2 comments

There is one instance where the NUMA performance never disappoints: https://www.youtube.com/watch?v=Cqd1Gvq-RBY
Is this on AMD? I wonder if it's all to do with NUMA or their CCD architecture etc (well these days Intel and everyone also does it to some extent).
Hundreds of cores is likely two sockets and so you've got NUMA there.

Scaling to large core counts has a lot of gotchas.