Hacker News new | ask | show | jobs
by richhickey 5037 days ago
The model of consistency envisioned by Datomic is one in which consistency normally available only within a transaction is available outside of any transactions, and without any central authority. Consistent views can be reconstituted the next hour, day or week. Consistent points in time can be efficiently communicated to other processes. Nothing about MVCC gives you any of that. MVCC is an implementation detail that reduces coordination overhead in transactional systems. I used MVCC in the implementation of Clojure's STM. While you might imagine it being simple to flip a bit on an MVCC system and get point-in-time support, it is a) not efficient to do so, and b) still a coordinated transactional system.

The differences I am pointing out, and the notion of place I discuss, are not about the implementation details in the small (e.g. whether or not a db is MVCC or updates its btree nodes in place) but the model in the large. If you 'update' someone's email is the old email gone? Must you be inside a transaction to see something consistent? Is the system oriented around preserving information (the facts of events that have happened), or is the system oriented around maintaining a single logical value of a model?

The fact is with PostgreSQL et al, if you 'update' someone's email the old one is gone, and you can only get consistency within a transaction. It is a system oriented around maintaining a single logical value of a model. And there's nothing wrong with that - it's a great system with a lot of utility. But it isn't otherwise just because you say it could be.

Also, you seem to be reacting as if I (or someone) has claimed that Datomic is revolutionary. I have never made such claims. Nothing is truly novel, everything has been tried before, and we all stand on the shoulders of giants.

I'm sorry my talk didn't convey to you my principal points, and am happy to clarify.

1 comments

First of all, thank you very much for the reply: you really didn't need to bother, as despite being a Clojure user who stores a lot of data, I'm probably simply not in your target market segment ;P.

For the record, I do not believe that you have explicitly stated this is revolutionary, although I believe various other people on HN in various threads on Datomic have. However, my specific reactions in the comment you are responding to are due to DanWaterworth's insistence that I believe that it is trivial: my original comment does not touch on this angle, and is entirely about "real databases aren't implemented like this".

That said, I do believe that if after 30 minutes of listening to a talk that doesn't mention "this is largely how existing systems are implemented, but we provide the ability to see all the rows at once", there is an implication "this isn't at all like anything you've ever seen or implemented before", which is why after DanWaterworth's comment, I started exploring that angle.

Yes: in the case of PostgreSQL's MVCC, the old e-mail is gone from the perspective of the model for other people not inside of a transaction viewing the contents, however the kinds of problems you were describing at the beginning of the talk did not need to avoid transactions.

However, the implementation is so close that if I were explaining this concept to someone else, I'd probably use it as a model, especially given that it even already reifies the special columns required to let you do the historical lookups (xmin and xmax).

As I mentioned in another comment on this thread (albeit in an edit a few minutes later), you can get historical lookup in PostgreSQL by just adding a transaction variable that turns off the mechanism that filters obsolete tuples: you can then use the already-existing transaction identifier mechanism and the already-existing xmin and xmax columns as the ordering.

The result is then that I'm watching the talk wondering where the motivation is: many of the listed motivations weren't really true faults of the existing systems, and the ones that remain seem like implementation details of the database technology.

In the latter situation, when I say it "could be" I really do mean "it is": PostgreSQL can take advantage of the fact that it is built out of MVCC when it builds other parts of itself, such as its streaming master/slave replication (which is another feature of many existing systems that you seemed to discount in your motivation section).

I am thereby simply not certain what the problem is that Datomic is trying to solve for me, whether it be revolutionary or evolutionary (again: I don't really care; I'm just commenting on the motivation section), as the listed motivations seem to be fighting against a strawman design for a database solution that doesn't have transactions to get you 90% there and isn't itself implemented and taking advantage of append-only storage.

Well, all you point out is that one aspect of datomic could be implmented with some SQL systems. Datomic however has many other aspacts that are intressting.

Other then that, the true genius is to recogniced that a system like that would be worthwhile. Just pointing out that one could theoreticly do that with something else is kind of pointless if nobody has ever done it.

I am not saying "Datomic is stupid" or anything so simple; I'm saying I was "disappointed" in this talk because it motivated Datomic against a strawman that mischaracterized the actual problems that people using "traditional databases" have sufficiently that it was no longer possible to determine what was actually being claimed as an advantage.

I realize that to many people it is impossible to dislike a presentation of something without disliking the thing being presented, the person making the presentation, and the entire ideology behind the presentation, but that is a horrible thing to assume and is unlikely to ever be the case to such a simple extreme.

I will even go so far as to say that watching this talk seems to be doing a disservice to many people on the road to doing them a legitimate service: some of the people commenting on this thread (or previous ones on HN about similar talks and articles about Datomic) actually do/did not realize that "traditional databases" can even do this at a transaction level, as the argument in the talk downright claims they can't.

The result is that when I bring up that you actually get even some of these advantages with off-the-shelf copies of PostgreSQL, I get comments of the form "I had no idea one could get a consistent read view across multiple queries within a transaction using most sql databases. That does poke a hole in a major benefit that I thought was unique to datomic, great to know!"; that can only happen when there is some serious misinformation (accidentally) being presented.

Now, does that mean that Datomic is something no one should use, and that it doesn't put things together in a really nice way, and that it doesn't have a single thing in it that is innovative, or that Rich is wasting his time working on it? No: certainly it does not. I did not claim that. I can't even claim that, as I gave up on the talk after the first half so I could spend my time attempting to clarify some of the things said in the first half that were confusing people.