|
|
|
|
|
by nialldalton
2535 days ago
|
|
From the outside it sounds like, whatever the database is, it has far too many critical services tightly bound within it. E.g. leader election implemented internally instead of as a service with separate lifecycle management - pushing the database query processor minor version forward forcing me to move the leader election code or replica config handling forwards... ick. From the description/comment it also sounds like the database operates directly on files rather than file leases as there's no notion of a separate local - cluster-scoped - byte-level replication layer below it. Harder to shoot a stateful node.. And sounds like it's tricky to externally cross-check various rates, i.e. monitor replication RPCs and notice that certain nodes are stepping away from the expected numbers without depending on the health of the nodes themselves. Hopefully the database doesn't also mix geo-replication for local access requirements / sovereignty in among the same mechanisms too.. rather than separating out into some aggregation layers above purely cluster-scoped zones! Of course, this is all far far easier said than done given the available open source building blocks. Fun problems while scaling like crazy :) |
|