|
|
|
|
|
by __turbobrew__
209 days ago
|
|
It makes me sad that to get these scalability numbers requires some secret sauce on top of spanner, which no body else in the k8s community can benefit from. Etcd is the main bottleneck in upstream k8s and it seems like there is no real steam to build an upstream replacement for etcd/boltdb. I did poke around a while ago to see what interfaces that etcd has calling into boltdb, but the interface doesn’t seem super clean right now, so the first step in getting off boltdb would be creating a clean interface that could be implemented by another db. |
|
etcd is fine for what it is, but that's a system meant to be reliable and simple to implement. Those are important qualities, but it wasn't built for scale or for speed. Ironically, etcd recommends 5 as the ideal number of cluster members and 7 as a maximum based on Google's findings from running chubby, that between-member latency gets too big otherwise. With 5, that means you can't ever store more than 40GiB of data. I have no idea what a typical ratio of cluster nodes to total data is, but that only gives you about 307MiB per node for 130,000 nodes, which doesn't seem like very much.
There are other options. k3s made kine which acts as a shim intercepting the etcd API calls made by the apiserver and translating it into calls to some other dbms. Originally, this was to make a really small Kubernetes that used an embedded sqlite as its datastore, but you could do the same thing for any arbitrary backend by just changing one side of the shim.