Hacker News new | ask | show | jobs
by jakupovic 206 days ago
Doing this at anything > 1k nodes is a pain in the butt. We decided to run many <100 nodes clusters rather than a few big ones.
3 comments

Same here. Non Kubernetes project originated control plane components start failing beyond a certain limit - your ingress controllers, service meshes etc. So I don't usually take node numbers from these benchmarks seriously for our kind of workloads. We run a bunch of sub-1k node clusters.
Same. The control plane and various controllers just aren't up to the task.
Meh, I've had had clusters with close to 1k nodes (w/ cilium as CNI) and didnt have major issues
When I was involved about a year ago, cilium falls apart at around a few thousand nodes.

One of the main issues of cilium is that the bpf maps scale with the number of nodes/pods in the cluster, so you get exponential memory growth as you add more nodes with the cilium agent on them. https://docs.cilium.io/en/stable/operations/performance/scal...

Thats true and I definitely had to "tune" the bpf map limits, but it wasn't really that difficult to do.
Wouldn't that be quadratic rather than exponential?