Hacker News new | ask | show | jobs
by deathanatos 4335 days ago
Please note that I have only had experience with Salt and fabric.

Salt falls short of what you want in the corner cases:

- We've found it's darn hard to upgrade. (To be clear, we'd like to upgrade by transitioning the master to a new VM; for one, this means things are clean (we can provision our salt-master through a fabric script), but it also allows us to change the amount of memory available.) The minions, when disconnected, do not reconnect to the hostname in their config: instead, they endlessly reconnect to the IP that the DNS resolved to when they were started. You can't simply change a DNS record and have the minions move. Please note that we're a bit behind in releases (we're using 0.17.2, IIRC) because of the difficulty of upgrading.

- YAML was a terrible choice for "state" files, in my opinion. State files contain lists of commands to execute on a remote host being configured: trying to specify args to functions in YAML is awkward.

- I'm of the opinion that the master-minion relationship is backwards. I'd be much more interested in something that connected to the minion. In particular, this would help with upgrading (the minion is controlled by two masters for a short period).

- The command line utilities are prone to user error: they return success during failure, they return no output and success because your states took too long to run, and it got bored. You can look up the job ID, but it's painful.

- The errors are utterly useless. In particular, Jinja rendering errors tend to reference incorrect locations in files, returning nonsense such as use of an undefined variable on a blank line.

- The output is useless too: you get a (very) verbose listing of everything that succeeded or failed. Telling if anything failed is the trick: it's buried in all the successes. (Terminal find is my friend here, but still, you have to be careful to watch out for boundaries between runs and not read an old run's output.) As discussed, the return code won't help you here.

- AFAICT, you need to be a particular user, and there is really no ACLs to speak of. All of our Salt stuff currently runs as a single user. People inevitably step on each others' toes.

- Non-responsive nodes are not mentioned in the output: they're the same as if they didn't exist! This results in some really wacky stuff happening. If you have variables that are lists of machines, the machine simply won't be in the list. This means if you need N of some type of machine, that list will be empty. (This often then triggers the aforementioned unreadable jinja error output, if you assume the list to be non-empty.)

- There is little capability for actual processing on the master itself. Sometimes, you need to coordinate the actions of several nodes together, such as generating keys for each node, and then distributing all keys to all nodes.