| Great work. Building a distributed database isn't easy at all and takes considerable effort. I'd like to see more on failure scenarios. Here's a few preliminary questions I have (admittedly, I haven't read the entire post, apologies if my queries are answered already): What is the SLA for durability and availability of the db? - how are scenarios like edge locations going down for multiple minutes handled? Are the writes lost? - how many replicas of data at a single edge location? - Are there limits on the table size (how does one enforce this in a multi-master mode)? - what's the SLA on replication time to say, at least 50% edge locations, and then to 100%? Are DDL operations allowed? If so, how are the conflicts handled? If data is stored in LSMs underneath, how are geo location queries handled? Is there an index/materialised view? If so, how long does that take to generate? What's the TPS/QPS supported for the KV interface? - Is there a scenario where a set of operations at X TPS across edge locations might then take a long time to converge globally? It'd be great if you can list down scenarios where the db can lose write to 'true conflicts' in a FAQ somewhere. - For instance, what happens in a create-delete-create scenario where a table is created at one edge location, deleted at another, created at a third edge location; all the while when there's writes and reads happening globally? Thanks. |
We also have a more technical internals paper almost completed and will publish next week.
Thanks again! This is such a helpful comment.