| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jakupovic 206 days ago
	Doing this at anything > 1k nodes is a pain in the butt. We decided to run many <100 nodes clusters rather than a few big ones.

3 comments

kvrty 206 days ago

Same here. Non Kubernetes project originated control plane components start failing beyond a certain limit - your ingress controllers, service meshes etc. So I don't usually take node numbers from these benchmarks seriously for our kind of workloads. We run a bunch of sub-1k node clusters.

link

liveoneggs 206 days ago

Same. The control plane and various controllers just aren't up to the task.

link

preisschild 206 days ago

Meh, I've had had clusters with close to 1k nodes (w/ cilium as CNI) and didnt have major issues

link

__turbobrew__ 206 days ago

When I was involved about a year ago, cilium falls apart at around a few thousand nodes.

One of the main issues of cilium is that the bpf maps scale with the number of nodes/pods in the cluster, so you get exponential memory growth as you add more nodes with the cilium agent on them. https://docs.cilium.io/en/stable/operations/performance/scal...

link

preisschild 205 days ago

Thats true and I definitely had to "tune" the bpf map limits, but it wasn't really that difficult to do.

link

oasisaimlessly 206 days ago

Wouldn't that be quadratic rather than exponential?

link