Hacker News new | ask | show | jobs
by zumachase 2167 days ago
We're in the same camp with a cluster ~2x as large for Squawk[1] and it would cost us many multiples in the cloud (excluding our TURN relays which aren't k8s). However, the one killer feature that the cloud still has over self hosted is the state layer. There is nothing that comes close to the turn key, highly available, point in time recoverable database offerings from the cloud providers. We're running Spilo/Patroni helm charts, and we've really tried to break our setup chaos monkey style. But I'll admit I'd sleep better leaving it in Amazon's hands (fortunately, with all the money we save, we have multiple synchronous replicas and ship log files every 10 seconds).

[1] Shamless plug Squawk: Walkie Talkie for Teams - https://www.squawk.to

_EDIT_ I've just read your blog post. We went the other direction and have used the local storage provisioner to create PVCs directly on host storage, and push the replication to the application layer. We run postgres and redis (keydb) with 3 replicas each with at least one in sync replication (where supported) and shipping postgres wal logs to S3 every 10 seconds.

2 comments

You can also try databases that are natively distributed with replication and scaling built-in. If you need SQL you have many "newSQL" choices like CockroachDB, Yugabyte, Vitess, TiDB, and others.
Why did you keep your TURN relays out of k8s?
Because we needed geographic distribution so that we don't end up hairpinning our users, and they only run a single service so the value prop is much lower. We use route 53 to do geodns across a number of cheap instances around the world (which is also nice, let's you pick regions with cheap bandwidth but good latency to major metro areas). We currently have TURN relays in Las Vegas, New York, and Amsterdam and that gives us pretty good coverage (sorry Asia...you're just so damn expensive!).

But all of our APIs sit in one k8s cluster across two datacenters (Hetzner, with whom we couldn't be happier).

Really interested in hosting at Hetzner, as their prices are fantastic by comparison to AWS, Azure & GCP.

I'm particularly interested in what an HA Postgres setup might look like. Assuming you are running some kind of database (whether Postgres or otherwise), what are you doing for persistent storage? Are you using Hetzner's cloud block storage volumes? What is performance like?

Interesting! Is that a single K8 control plane across one cluster? We've gone with fully isolated clusters across 2 data centers to protect against a network isolation incident between them causing a split brain/borking etcd.
Yes the control plane is only in one of the data centers. The other only runs admin services like offsite backups, our development infra (gitlab, etc) and CI/CD.

We could definitely do two clusters and probably should, but the secondary data center has few services that it wasn’t really worth the extra work.

Oh cool, interesting. Thanks for the overview