Hacker News new | ask | show | jobs
by user5994461 3500 days ago
Author from the quoted paragraph here.

0. The lifecycle of docker containers is an extremely complex topic with limited documentation. It's safe to assume that it's out of reach for 9X% of readers here. One needs to fully understand the lifecycle of their containers to attempt to run databases in Docker, that's a huge barrier to entry. Advising 100% of people to run production (i.e. permanent, long lived) databases in Docker is terrible advise.

1. The entire concept of containers is based on being ephemeral. They do have a storage (in /var/lib/docker/<cryptic-structure>) and they should be started with -rm to make sure that everything they did is cleaned up automatically after they exit. If you want to keep the data and make something around that, good look with that!

2. Wrong. There is a truckload of magic going on here from filesystems to networking. Docker is hell to debug. A fucked database hidden away in Docker will be close to impossible to debug. If you're a sysadmin, you do not want to be in that position, trust me.

3. The odds of a database issues are at lest 3 orders of magnitudes higher if running within Docker. The docker ecosystem is notoriously unstable and the filesystems are unreliable. (Plus Databases are IO intensive which is gonna trigger all the rare bugs and race conditions).

Seriously. If you got a brain cell at Docker Corp. PLEASE STOP overselling your product and advising it for absolutely everything without considerations for what people are doing.

Every time one of you guys advise to run databases in Docker, you're objecting to everything that docker stands for (i.e. statelessness). Not only it is confusing the hell out of people but it's putting them on a guaranteed path for future catastrophic failures.

Running production databases inside docker. Just because it's not strictly impossible, doesn't mean it's possible.

    [See RFC1925 https://tools.ietf.org/html/rfc1925 ]
   (3)  With sufficient thrust, pigs fly just fine. However, this is
        not necessarily a good idea. It is hard to be sure where they
        are going to land, and it could be dangerous sitting under them
        as they fly overhead.
1 comments

0. There is a plethora of documentation. Even the CLI suggests the lifecycle (start, stop, restart, pause, unpause).

1. This is simply not true. Your understanding is that they are based on being ephemeral, but this is not inherent in any sort of design of containers.

2. Magic is not really magic when you understand what's happening. Cgroups apply resource limits on a process, namespaces limit what a process can see. These come together to make containers. The host still has full visibility on these processes just like any other process on the system.

3. Do you have data to back this up? A container is just a process that is namespaced and resource limited. If you are writing to the copy-on-write filesystem provided for the container with a database, then you are doing it wrong (in 99% of cases). For that matter, you can even use ZFS for the container FS, which has been in use in production scenarios for quite some time... performance may not be great with ZFS here but integrity will be (not that I'm advocating for writing directly to the container FS... not at all, really).

There is nothing about Docker and statelessness. It can sure make cleaning up after a process a bit simpler but this doesn't mean that docker equates to statelessness.

Storage is hard whether you are in a container or not. Process isolation does not affect this.

0. That doesn't explain anything about what's happening underneath. It's far from enough to even form a mental model about Docker operations.

1. The stateless & The ephemeralness & The tooling. It all goes together. Just because its not enforced all the time at every level doesn't mean that it's a good idea to diverge from it.

2. What about the networking? the DNS magic? the storage? the filesystems? the lifecycle of data across containers & images and containers & further containers? the log management? the logging drivers? It would take multiple books to cover these topics.

3. Again the filesystem and storage issue should cover an entire book. There are many blog posts and issues talking about that. ZFS only became available very recently and exclusively to Ubuntu, it's ridiculous to consider that as a real world scenario.

Docker equals stateleness. That's the only thing it's supposed to do and could do well. Maybe you should consider focusing on one use case that Docker does well (i.e. packaging & deploying stateless applications). That would make up for better documentations and explanations and goals ;)

(IMO. After reading your comments, it seems that you have no clue whatsoever about systems internals [or maybe we just don't communicate well on that]. That's scary if Docker itself doesn't have a clue about what it is nor what it should be.)

> For that matter, you can even use ZFS for the container FS, which has been in use in production scenarios for quite some time... performance may not be great with ZFS here but integrity will be

It's not a very good fit for a production database if "performance may not be great"?

> (not that I'm advocating for writing directly to the container FS... not at all, really).

> There is nothing about Docker and statelessness.

You just recommended against storing state in the container FS on the previous line. What kind of state are you advocating a container should keep (that is different from what is captured the docker file and any separate data volumes)?

> Storage is hard whether you are in a container or not. Process isolation does not affect this.

But abstraction does. Normally for a database, you'd have a mirrored set of ssds, lots of ram, spread over a couple of physical nodes. Maybe with a loadbalancer thrown in.

Or maybe you'd run your nodes as a vm, with iscsi or some other nas/das. I can't recall seeing reasonable advice on how to set up such a production system with docker (but I haven't looked all that hard!).

Last time i checked, I couldn't find any suggestions for high-performance, well-tested container storage?

Depends on in high-performance is what you need, but this was just an example of even the container FS can have incredible integrity.

Why would a container keep from using mirrored sets of SSDS, RAM, or an LB?

The absolute worst case you can set these up manually on your host and map the directories into the container.

A better scenario, the various storage systems (EMC, NetApp, Ceph, name it) out there have volume plugins integrating with Docker, Kub, etc.

How to handle storage in the container depends on your needs, just like as if it was VM or a physical machine... and ultimately the setup is in the worst of cases no different.