|
In your last paragraph, I feel like you are mischaracterizing my overall thesis. I am not claiming the design is simple: MVCC took many lives in sacrifice to its specification and discovery, and I certainly am not claiming "anyone could have thought that up". Instead, my primary issue is that this is a talk about databases and database design that is providing motivation vs a strawman: specifically, the way Rich seems to believe "traditional databases" work, and for which we spend the first almost 20 minutes learning the negatives, roadblocks, and general downsides. However, almost none of the things that he indicates actually are downsides of most modern database systems, and certainly not of PostgreSQL. His downsides include that the data structuring is simplistic, that you can't have efficient and atomic replication of it (not multi-master mind you, but seemingly even doing real-time replication of a master to a read-only slave while maintaining serialization semantics seems to be dismissed), and that if you attempt to make multiple queries you will get inconsistent data due to update-in-place storage. Yes: update-in-place "storage", not "update-in-place semantics within the scope of an individual transaction". Even if he was very clear about the latter (which is again quite different from "update-in-place semantics", which MVCC definitely does not have), that would still undermine his points, as the problem of inconsistent data from multiple reads, a problem he goes into great detail about with an example involving a request for a webpage that needs to make a query first for its backend data and then for its display information, does not exist with MVCC. During this discussion of storage, he specifically talks about how existing database storage systems work, not at the model level, but at the disk level, discussing how b-trees and indexes are implemented with their destructive semantics... and all of these details are wrong, at least for PostgreSQL and Oracle, and I believe even for MySQL InnoDB (although a lot of its MVCC semantics are in-memory-only AFAIK, so I'm happily willing to believe that it actually destroys b-tree nodes on disk). The talk then discusses a new way of storing data, and that new way of storing data happens to share the key property he calls new with the old way of storing data. The result is that it is very difficult to see why I should be listening to this talk, as the speaker either doesn't know much about existing database design or is purposely lying to me to make this new technology sound more interesting :(. Your response that in a different talk he attempted to backpatch his argument with something that still doesn't seem to address MVCC's detectably-not-the-same-as-update-in-place-semantics doesn't help this. Now, as I stated up front, after listening to half of this talk, I couldn't take it anymore, and I gave up: I thereby didn't hear an entire half hour of him speaking. Maybe somewhere in that second half there is something new about how some particular quirk of his model allows you to get a distributed system, but that seemed sufficiently unlikely after the first half that it really doesn't seem worth it, and based on the comments from discussion (such as in the threads started by bsaul and sriram_malhar, which seems to indicate that writes are centralized and reads are distributed, something you can do with any off-the-shelf SQL solution these days) that seems to hold up. |
The differences I am pointing out, and the notion of place I discuss, are not about the implementation details in the small (e.g. whether or not a db is MVCC or updates its btree nodes in place) but the model in the large. If you 'update' someone's email is the old email gone? Must you be inside a transaction to see something consistent? Is the system oriented around preserving information (the facts of events that have happened), or is the system oriented around maintaining a single logical value of a model?
The fact is with PostgreSQL et al, if you 'update' someone's email the old one is gone, and you can only get consistency within a transaction. It is a system oriented around maintaining a single logical value of a model. And there's nothing wrong with that - it's a great system with a lot of utility. But it isn't otherwise just because you say it could be.
Also, you seem to be reacting as if I (or someone) has claimed that Datomic is revolutionary. I have never made such claims. Nothing is truly novel, everything has been tried before, and we all stand on the shoulders of giants.
I'm sorry my talk didn't convey to you my principal points, and am happy to clarify.