|
Sorry to put you on the spot. Perhaps your CTO can chime in when he gets back. Serializable tends to sound really good to database people, because it's the "highest" isolation level available in most databases. However, those databases are built for a single machine, and they really offer "strict serializable" without calling it that, because they maintain a single log which serializes all writes, and all reads operate as of the latest time. With the advent of distributed databases, the "strict" part is becoming more relevant, because it's no longer something you just get without realizing it. I gave an example of the "blog post" anomaly that occurs in a system that only guarantees serializability. As another example, imagine I am using FaunaDB to store bank account balances. As the developer, I design a system where any time a customer receives money to their account, I send them a text giving the amount and their new balance, along with a link to the website where they can get more details. Except, when they click the link and go to the website, they don't see any change to their balance. Furthermore, when they refresh the page, sometimes they do see their new balance, and then they refresh again, and it disappears! More investigation reveals that FaunaDB is in a partitioned state, and was serving stale bank account balances from some replicas. The replicas were behind a load balancer, and sometimes the web client hit one replica, and sometimes another, which explains the alternating balances. Is it possible to solve this problem using clever application logic, causal tokens, or saying "don't do that"? Sure. But a consistent database is supposed to prevent this scenario, not the application. Also, note that the "this is no different than increased client-server latency" argument does not hold here. People expect that if they finish action A (like writing to the DB), then do action B (like clicking a web link that reads the DB), that B will always see the results of A. The stale bank account balance isn't a delayed value; it's a wrong value. You're correct that most systems allow some variant of "secondary" or "follower" reads that can return stale data, in exchange for better performance. I'm all for this kind of feature, as long as the developer understands the tradeoff they're making. Moreover, I think it's appropriate for FaunaDB to offer the same. But I'd advise against making unqualified claims of "complete guarantees of serializability and consistency" and the "latency profile of an eventually-consistent system like Apache Cassandra". FaunaDB is able to offer one or the other, but not both at once. |
I think it's a stretch to say that including a single timestamp in the link is too hard for developers. It happens by default across queries within a process with the Fauna drivers, so the database is preventing it; the only place you have to think about it is multi-process, multi-location reads. Also, having a loadbalancer jump back and forth between two replicas would not normally occur because of a geo routing, a soft form of datacenter affinity, but it is possible. Note that in your example, even if the link didn't include the timestamp, once they visit the webpage, the webserver should begin managing the timestamp in process or in a session cookie, preventing the the balance from disappearing once seen for the first time.
In my experience as a DBA the legacy database experience is much worse because in practice there are always physically distinct read replicas in the mix for scalability and hot failover that have no consistency guarantees at all, due to various faults like statement-based replication desynchronization, the lack of any ability to share log cursor information, and the like. I am not making a worse is better argument. I'm just pointing out that you don't get any C at all for the feeble amount of A gained under CAP in practice with most “single node” systems.
You are correct that the latency argument does not hold if you do not pass the timestamp information across queries. But if the observer can observe multiple replicas in the database topology, it is always possible to use the observer itself as the communication channel for the timestamp to avoid the anomaly. If you want to maintain strict serializability in all cases, but minimum-latency reads, the writer could block on transaction application acknowledgement from all replicas. Obviously that gives up availability for writes under partition, but it may be reasonable in your asynchronous example in order to avoid sending the txt message too early. Perhaps, for example, the message is "go to your local ATM" instead of a link with a timestamp. Although I would expect the ATM had the latency budget to do a linearized read even in a global context.
We are always happy to make the descriptions of the behavior and consistency levels more precise.