Hacker News new | ask | show | jobs
by squibbles 1959 days ago
Definitely a good one: "Experiences with running PostgreSQL on Kubernetes - Gravitational - blog post 2018"

For anyone who thinks running a database in a container environment is a neat idea, think again. I am guilty of using containers for temporary test databases, but the thought of running production databases in containers sends shivers down my spine.

5 comments

This is exactly what AWS does with something like Aurora: https://awsmedia.awsstatic-china.com/blog/2017/aurora-design...

Not sure why you're so fearful. Bare metal machines can crash in weird ways and K8s containers can be just as reliable as the underlying host.

I don’t know why anybody would presume that a technology focused on ephemeral resource provisioning would be a suitable place to put your persistence layer...

That said, I don’t think it’s a sin at all to use it for testing. My default local dev setup is to use a Postgres container. But persistence is very much not required in that situation.

> I don’t know why anybody would presume that a technology focused on ephemeral resource provisioning would be a suitable place to put your persistence layer...

Kubernetes does more than that, and has features like PVCs + Statefulsets are basically intended for, designed for exactly this use case. If you see the HN comments[1], the top comment mentions this, and that the article waves it away for reasons not related to k8s, but to "well, if the underlying storage is slow or not durable, then…" … yeah, then it doesn't matter if you're running k8s in the middle of it or not.

[1]: https://news.ycombinator.com/item?id=16207430

Kubernetes was not initially designed with persistence in mind. If it was then the etcd of the master nodes would also be in containers.

There are (good) attempts at shoving it in, but the general advice I would give is that if you care about your data you should give it every possible chance for to not be corrupted or disrupted; and that means keeping the number of abstractions and indirections low.

Everything comprising control plane runs in a container including etcd, apiserver, kube-controller and scheduler
You can make it work, but why would you want to? Databases aren’t generally something that benefits from using container orchestration. They’re not usually highly dynamic, horizontally scaling systems. Generally you’d optimize that part of your system to maximize stability and consistency. For most typical use cases I can’t see the intuitive leap required to decide that all that additional complexity is necessary to attempt to replicate what you’d get from a few VPS. Unless you have a specialized use case, to me it just seems like very obviously the wrong tool for the job.
It that I advocate running your own Postgres setup in your own cluster instead of just renting a managed version, but I’ve run a few databases on K8s and found it pretty fine: useful for when your hosting provider doesn’t support the database you want to run (Clickhouse managed AWS service when?) or for application-specific KV-stores: EBS volumes and PVC’s are great, solid performance, kubernetes takes care of the networking, will resurrect it if the worst happens and it does go down.

I probably could have those things on their own instance but then I’d need to have to go through the hassle of networking, failover/recreation, deployments, etc and for the vast majority of cases that’s 100% more effort than deploy a stateful-set.

> (Clickhouse managed AWS service when?)

Now! Altinity runs Altinity.Cloud now in AWS. Feel free to drop by.

There are also services in other clouds. Yandex runs one in their cloud and there are at least 3 in China. ClickHouse has a big and active community of providers.

Disclaimer: I work for Altinity.

> Now! Altinity runs Altinity.Cloud now in AWS. Feel free to drop by.

Fantastic to hear! This is so exciting.

Kubernetes does more than that, and has features like PVCs + Statefulsets are basically intended for, designed for exactly this use case

There is significant impedance mismatch between this and the other K8s assumption that pods are ephemeral and can be relocated or restarted any time.

I strongly support this advice having felt the pain.

Inherited a setup using a semi-well-known vendors Patroni/Postgres HA operator implementation on OpenShift and it was extremely fragile to any kind of network latency/downtime (due to its strong tie to the master api) or worker node outage/drainage/maintenance. These events would mean hours of recovery work hacking around the operator.

It was not my decision to place Postgres on OpenShift and I will strongly discourage anyone planning to do this for production (or even testing). Please do not do it if you value your time and sanity. Spin up a replica set on VMs using one of the already production ready and battlehardened solutions or if in cloud use a managed Postgresql service.

Some companies make a business out of it: https://www.crunchydata.com/products/crunchy-postgresql-oper...
What about single-host container (i.e. regular docker host, not k8s) with data partition mounted from host?
For me, personally -- I cannot think of a sufficient justification to put a production database in a container. A good database server is designed for performance, reliability, scalability, security, etc., without containers. Putting a production database inside a container introduces a world of unnecessary edge cases and complexity.
Depends on requirements. Someone needs one big, highly-optimized DB instance. Someone else needs high-availability 3+ instance cluster. Having a cluster of containers brings performance penalty but if your app is read-heavy, you can read from all instances and multiply read throughput...
Thanks for your feedback. I might run the DB on the host then, and just use containers for the app server. I'm not at the scale to warrant a separate host for the DB.
And using delegated, for local development, multiple versions with a few commands is really great.