Hacker News new | ask | show | jobs
by orf 2517 days ago
1. Why are you running docker volume prune in production?

2. Why are you running docker on ad-hoc machines you need to prune?

3. Why do you even need root access on production machines to fiddle around with docker commands?

While this is obviously a bad bug (and there are many with Docker), it seems more of an operational procedures failure than anything else. You could be saying:

“Beware of rm -rf /, it just deleted 20gb of production data”

Ok. Sure. But why are you tools and procedures putting yourself in a position to make that mistake?

3 comments

One of the most bothersome part of HN is when someone tells us about something that happened, and out come a ream of second-guessing replies. "Why didn't you just do this?" and Why didn't you just do that?" and any number of "It's so easy to just thing instead!"

We don't know his environment. We don't know his company's policies. We don't know his hardware, connectivity, or budget issues. These kinds of passive aggressive responses are almost never helpful.

When you reduce it down the title here is “giving people access to running arbitrary, manual and presumably unrestricted maintenance commands in production leads to issues”.

That’s not a surprise, and maybe the issue at the core here is not really Docker. That’s all.

I agree with both of you. It's not helpful to not know the context and the op wasn't necessarily in control of it. But at the same time if you are someone in control of the context (which you aren't really if you are a line level employee) you should be aware this is a bad pattern. If you are a line level employee and this is being imposed on you for some reason or other you should sound an alarm if you know to "hey, for the record - this is a monumentally bad idea - just saying".

I've seen plenty of stuff in my career where I've gone on record to say "hey - we really shouldn't do this". Nothing got done about it. But hey, I did what I could.

Recently I learned about Rasmussen's dynamic safety model. I think this is a very handy mental model to have. It's the human factors that make what we do really hard. Often line level practitioners know better than they are allowed to do in practice and trying to fight organizational politics to Do The Right Thing can be an uphill battle.

Sure, but regardless of people doing dumb things, it's still worth asking "why did docker delete non-orphaned named volumes?" -- though you could also question whether someone was actually mistaken about them not being "orphaned" - you could probably arrange an unfortunate timing collision between someone running prune and a container being respawned.
Right, that's what raising an issue with the software maintainers are for.

Aside from anecdotes, there's little value in further discussion beyond the PSA that is the original post; save for prevention/recovery of such events.

It almost sounds as if the daemon was in the process of starting the containers and the prune command was issued. If it were run with `-f` and the container wasn't running those volumes would be deleted. I tried this on a test system and didn't get the results in the issue.
Well, no. rm -rf / is a completely different beast. It is documented and expected for starters.

They may have valid reasons to do that, even if not common.

thinking of the `rm -rf` one, here is a fun take:

  export $WORKDIR=Home/me/proj
  ...
  rm -rf /$WORKDIR
If something unsets $WORKDIR or does not set it at all, wave bu-bye to your everything. And before you say "who would do that?!" -- I believe I heard that happened to a build of RedHat that also had some kind of force push and auto-pull and build on their version control so every connected person had their version of the software nuked. If not for the non-connected individuals, the entire software would be gone apparently. Or so the legend goes.
Also the Steam Linux client.