|
|
|
|
|
by richhickey
5037 days ago
|
|
The model of consistency envisioned by Datomic is one in which consistency normally available only within a transaction is available outside of any transactions, and without any central authority. Consistent views can be reconstituted the next hour, day or week. Consistent points in time can be efficiently communicated to other processes. Nothing about MVCC gives you any of that. MVCC is an implementation detail that reduces coordination overhead in transactional systems. I used MVCC in the implementation of Clojure's STM. While you might imagine it being simple to flip a bit on an MVCC system and get point-in-time support, it is a) not efficient to do so, and b) still a coordinated transactional system. The differences I am pointing out, and the notion of place I discuss, are not about the implementation details in the small (e.g. whether or not a db is MVCC or updates its btree nodes in place) but the model in the large. If you 'update' someone's email is the old email gone? Must you be inside a transaction to see something consistent? Is the system oriented around preserving information (the facts of events that have happened), or is the system oriented around maintaining a single logical value of a model? The fact is with PostgreSQL et al, if you 'update' someone's email the old one is gone, and you can only get consistency within a transaction. It is a system oriented around maintaining a single logical value of a model. And there's nothing wrong with that - it's a great system with a lot of utility. But it isn't otherwise just because you say it could be. Also, you seem to be reacting as if I (or someone) has claimed that Datomic is revolutionary. I have never made such claims. Nothing is truly novel, everything has been tried before, and we all stand on the shoulders of giants. I'm sorry my talk didn't convey to you my principal points, and am happy to clarify. |
|
For the record, I do not believe that you have explicitly stated this is revolutionary, although I believe various other people on HN in various threads on Datomic have. However, my specific reactions in the comment you are responding to are due to DanWaterworth's insistence that I believe that it is trivial: my original comment does not touch on this angle, and is entirely about "real databases aren't implemented like this".
That said, I do believe that if after 30 minutes of listening to a talk that doesn't mention "this is largely how existing systems are implemented, but we provide the ability to see all the rows at once", there is an implication "this isn't at all like anything you've ever seen or implemented before", which is why after DanWaterworth's comment, I started exploring that angle.
Yes: in the case of PostgreSQL's MVCC, the old e-mail is gone from the perspective of the model for other people not inside of a transaction viewing the contents, however the kinds of problems you were describing at the beginning of the talk did not need to avoid transactions.
However, the implementation is so close that if I were explaining this concept to someone else, I'd probably use it as a model, especially given that it even already reifies the special columns required to let you do the historical lookups (xmin and xmax).
As I mentioned in another comment on this thread (albeit in an edit a few minutes later), you can get historical lookup in PostgreSQL by just adding a transaction variable that turns off the mechanism that filters obsolete tuples: you can then use the already-existing transaction identifier mechanism and the already-existing xmin and xmax columns as the ordering.
The result is then that I'm watching the talk wondering where the motivation is: many of the listed motivations weren't really true faults of the existing systems, and the ones that remain seem like implementation details of the database technology.
In the latter situation, when I say it "could be" I really do mean "it is": PostgreSQL can take advantage of the fact that it is built out of MVCC when it builds other parts of itself, such as its streaming master/slave replication (which is another feature of many existing systems that you seemed to discount in your motivation section).
I am thereby simply not certain what the problem is that Datomic is trying to solve for me, whether it be revolutionary or evolutionary (again: I don't really care; I'm just commenting on the motivation section), as the listed motivations seem to be fighting against a strawman design for a database solution that doesn't have transactions to get you 90% there and isn't itself implemented and taking advantage of append-only storage.