| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by programnature 5216 days ago

It is totally ordered.

The transactor is a single point of failure.

However, since its only job is doing the transactions, the idea is it can be faster than a database server that does both the transactions and the queries.

4 comments

snprbob86 5215 days ago

Hmm... presumably an application can act in read-only mode in the absence of a transactor. That's an interesting thought :-)

link

weavejester 5215 days ago

The transactor is only used for writes, so if the transactor went down, you could still run queries.

link

alkby 5214 days ago

I think their statement about ACID is too bold.

How does somebody do read-"modify" style of transactions ?

Say I want to bump some counter. So I delete old fact and I establish new fact. But new fact needs to be exactly 1 + old value of counter. With transactions as simple "add this and remove that" you seemingly cannot do that. So it's not ACID. Right?

link

richhickey 5214 days ago

Transactions are not limited to add/retract. There are also things we call data functions, which are arbitrary, user-written, expansion functions that are passed the current value of the db (within the transaction) and any arbitrary args (passed in the call), and that emit a list of adds/retracts and/or other data function calls. This result gets spliced in place of the data function call. This expansion continues until the resulting transaction is only asserts/retracts, then gets applied. With this, increments, CAS and much more are possible.

We are still finalizing the API for installing your own data functions. The :db.fn/retractEntity call in the tutorial is an example of a data function. (retractEntity is built-in).

This call:

    [:db.fn/retractEntity entity-id]

must find all the in- and out-bound attributes relating to that entity-id (and does so via a query) and emit retracts for them. You will be able to write data functions of similar power. Sorry for the confusion, more and better docs are coming.

link

danieljomphe 5214 days ago

From what I remember, compare-and-swap semantics are in place for that kind of case.

If that was not the case, you could still model such an order-dependent update as the fact that the counter has seen one more hit. Let the final query reduce that to the final count, and let the local cache implementation optimize that cost away for all but the first query, and then incrementally optimize the further queries when they are to see an increased count.

That said, I'm pretty sure I've seen the simpler CAS semantics support. (The CAS-successful update, if CAS is really supported, is still implemented as an "upsert", which means old counter values remain accessible if you query the past of the DB.)

link

danieljomphe 5214 days ago

Forget my last paragraph. Anyways, richhickey answered. :)

link

anamax 5215 days ago

> However, since [the transactor's] only job is doing the transactions

Huh? How is that consistent with:

> access the data storage through a new distributed component called a transactor.

If "doing the transactions" consists of more than passing out incrementing transaction tokens, won't the transactor be a bottleneck?

link

rjn945 5215 days ago

Yeah, it looks like I got that part wrong. (I intentionally skimmed over the transactor, because I was avoiding "how" issues and because my understanding of it wasn't that clear.)

The transactor is involved in just writes, not reads. (So that helps.) It's not distributed and cannot be distributed, in this system, because it ensures consistency, so yes, it is potentially a bottleneck. In blog comments by Rich Hickey[1], he states:

"Writes can’t be distributed, and that is one of the many tradeoffs that preclude the possibility of any universal data solution. The idea is that, by stripping out all the other work normally done by the server (queries, reads, locking, disk sync), many workloads will be supported by this configuration. We don’t target the highest write volumes, as those workloads require different tradeoffs."

Presumably, 1) the creators of Datomic think that performance can be good enough to be useful, 2) this is a new model that probably requires testing to prove is practical.

[1] Multiple people have linked to it, but for convenience: http://blog.fogus.me/2012/03/05/datomic/comment-page-1/#comm...

link

sausagefeet 5215 days ago

Isn't this the same compromise we would already have had to make if we just used postgres?

link

weavejester 5215 days ago

It's actually slightly better than a SQL database. If your master SQL database gets fried, there's a chance you could lose some data. Datomic's transactor only handles atomicity, not writes, so if the transactor dies, nothing written to the database will be lost.

link