Hacker News new | ask | show | jobs
by KaiserPro 4338 days ago
Having deployed salt to a medium sized cluster ~1500 farm machines, and around 1500 desktops, the one thing that salt won't do is scale.

Salt has a lovely system where clients attach themselves to a zeromq and listen for commands. However after about 500 clients it starts to fail silently and not all clients update properly.

The way we get round it is to run salt-call on the client at specific intervals. The other annoyance is that is horribly slow (60 seconds plus to run 100 ops (excluding yum operations))

having said that, the YAML syntax with optional python extensions is grand. Whether its quite ready for mainstream adoption is another matter. It sort of works for us.

2 comments

We have 2700 machines using a single Salt master.

You have to tune it or you have the "thundering herd" problem. There are two parameters if I recall:

* a delay between master queries.

* randomization of when to check with the master.

You have to get pretty liberal with these values to scale out, but I assure you, it does work.

I would post a bug report. I know linkedin has over 10k nodes with saltstack. Thomas was there tuning it so I'm sure it should work.
its something they know about. The current workaround is multiple masters, which isn't entirely practical
Wikimedia runs 1,000+ nodes on a really small box. No need for multiple masters. Just increase your worker threads.