Hacker News new | ask | show | jobs
by an-allen 2730 days ago
I’ve always been troubled by production-grade handling of state in containers - specifically as it pertains to data backup.

This module takes that into account - and defines a “backup k8s object” that will trigger a db dump. But there is still no way to get point in time data recovery/backup that you get from current production-grade managed state providers. Im going to say its production grade if we are using the standards of 10 years ago. Production-grade today, I feel, is a bit more robust.

1 comments

https://github.com/zalando-incubator/postgres-operator supports point in time data recovery just fine and is used in production for 100s of databases at Zalando.
It would be good to know the size and scale of these databases.
I don’t have actual numbers but I did a quick search and most are a few GiB to tens of GiB, although there are a few hundreds of GiB large. In practice size is not the limiting factor, IOPS are because they all use gp2 EBS volumes. Databases that have huge IOPS requirements are still deployed outside of Kubernetes and run in i3 instances. In that case they still use spilo though, so basically the same system for backups and automatic failover as on Kubernetes.

That being said we also have an ElasticSearch operator that is used to deploy ElasticSearch on Kubernetes, there nodes running on i3 instances and the corresponding instance storage is used. Although used in production that’s still very new and sadly not open source.

>"In that case they still use spilo though, so basically the same system for backups and automatic failover as on Kubernetes."

What is "spilo"? I am not familiar with this term. Thanks.

Spilo[1] is a Docker image that provides postgres bundled with Patroni[2].

The postgres-operator I linked earlier but also our setup on AWS (with one image per EC2 instance) uses that to actually run Postgres.

  [1]: https://github.com/zalando/spilo 
  [2]: https://github.com/zalando/patroni