Hacker News new | ask | show | jobs
by rtpg 429 days ago
I mean it feels pretty obvious to me that cell execution order is a pretty real issue for a runbook with a bunch of steps if you're not careful.

I do think that given the fragile nature of shell scripts people tend to write their operation workflows in a pretty idempotent way, though...

2 comments

agreed - we actually have a dependency system in the works too!

you can define + declare ordering with dependency specification on the edges of the graph (ie A must run before B, but B can run as often as you'd like within 10 mins of A)

There of course should be a way to override the dependency, by explicitly pressing a big scary "[I know what I'm doing]" button.

Another thing is that you'll need branches. As in:

  - Run `foo bar baz`
  - If it succeeds, run `foo quux`,
    Else run `rm -rf ./foo/bar` and rerun the previous command with `--force` option.
  - `ls ./foo/bar/buur` and make certain it exists.
Different branches can be separated visually; one can be collapsed if another is taken.

Writing robust runbooks is not that easy. But I love the idea of mixing the explanatory text and various types of commands together.

I mean, is it worse than having it:

- in excel

- in a confluence document

- in a text file on your desktop

The use case this addresses is 'adhoc activites must be performed without being totally chaotic'.

Obviously a nice one-click/trigger based CI/CD deployment pipeline is lovely, but uh, this is the real world. There are plenty of cases where that's simply either not possible, or not worth the effort to setup.

I think this is great; if I have one suggestion it would just be integrated logging so there's an immutable shared record of what was actually done as well. I would love to be able to see that Bob started the 'recover user profile because db sync error' runbook but didn't finish running it, and exactly when that happened.

If you think it's a terrible idea, then uh, what's your suggestion?

I'm pretty tired of copy-pasting commands from confluence. I think that's, I dunno, unambiguously terrible, and depressingly common.

One time scripts that are executed in a privileged remote container also works, but at the end of that day, those script tend to be specific and have to be invoked with custom arguments, which, guess what, usually turn up as a sequence of operations in a runbook; query db for user id (copy-paste SQL) -> run script with id (copy paste to terminal) -> query db to check it worked (copy paste SQL) -> trigger notification workflow with user id if it did (login to X and click on button Y), etc.

I'm not against this notebook style, I have runbooks in Jupyter notebooks.

I just think it's pretty easy to do things like start a flow back up halfway through the book and not fix some underlying ordering issues.

With scripts that you tend to have to run top to bottom you end up having to be more diligent with making sure the initial steps are still OK because on every test you tend to run everything. Notebook style environments favor running things piecemeal. Also very helpful! It introduces a much smaller problem in the process of solving the larger issue of making it easier to do this kind of work in the first place.