Hacker News new | ask | show | jobs
by CmdrDats 4139 days ago
Technically it could - but I felt Datomic is particularly strong as an aggregate view. Adding transactions that just have events attached to them isn't a particularly useful aggregate to query.

Having the eventstore separate means that you can keep Datomic focussed at what it's really good at. When you find good aggregate views for your events, you can scan through old events and populate Datomic.

This separation also means that you can make the events as rich and plentiful as you want without burdening Datomic with stuff that you don't have a good way to query anyhow.

2 comments

The event is the transaction.

For example of I have a 'make_user_preferred' event, I just transact the database and then add any metadata I need to the Datomic transaction.

Datomic is Event Sourcing if you create a realized version of your database on every event. It is just smart about it and keeps diffs.

> Adding transactions that just have events attached to them isn't a particularly useful aggregate to query.

I still don't understand.

The transaction log, in datomic, can be queried as easily as you would query the rest of the data. The view exists on the client. Changes are already queued by the transactor. And, changes are pushed to the clients from the transactor.

So to have a concrete example: in our preliminary test of Datomic, we threw all our page view events directly at Datomic. Every single attribute in the event gets indexed at least 3 times and it grew out of a manageable scale really quickly. We had to find a solution where we can still maintain that data and aggregate it into Datomic when and where we can make useful sense out of it.

In certain cases you can definitely put the events directly into the transactions. In our case, it's just not the right fit and I suspect that many others will find the same constraints apply.

> Every single attribute in the event gets indexed at least 3 times and it grew out of a manageable scale really quickly.

Ok, so it was a technical issue. I assume you attempted to change the indexing.

The technical reasoning did not come through in the project page or your comments.

It's not a technical issue. The fact that field "x" in a record changed from 4 to 5 (even if you can know every value that has ever been in that field) will never be able to tell you why that value changed.

Was it because a CSR got a call from a pissed-of-customer and agreed to change it? Was it because a third-party system fucked up and decided give the customer a little more "x"?

This is where Domain-Driven-Design comes in and CQRS/Event Sourcing really shines. The net effect of those two events I just described may be exactly the same for MY domain, but if there are other systems that are listening for the domain's events they may come to radically different conclusions. In CQRS/ES you would actually record those two things as different events, semantically.

It really upsets me that people don't get this about CQRS/ES. Maybe I should just start spamming links to Greg Young's talks...

I believe you misunderstand my comments.

A Datomic transaction is also an entity, with associated facts about that transaction. You can add custom facts, such as your application's domain event, directly to the transaction entity to further describe the event/transaction. You can then query over the transactions as you would any other data. In CQRS/ES terms, it is an event store with snapshots on every event without the cost of duplicate data.

The technical issue appears to be that their event stream was of a high enough throughput that the indexing caused a space issue.

My apologies - my example was to illustrate a fit mismatch, not so much technical deficiencies.

Let's try it from a different angle: If you wanted to include every attribute of every event into Datomic as well as the aggregate view, you would have to add schema attributes for every event property AND the logical aggregate properties.

The extra schema doesn't really buy you anything other than being able to query your events, which you'll usually do by event type and date range. If you do need richer querying you're likely looking at an aggregate that happens to match the events. That's ok, and a design decision you can make.

Using DynamoDB raw means that events can have arbitrary shapes (including nested data structures) and you just dump them in verbatim. Then you only worry about your aggregate schema in Datomic.

In cqrs-server, we are also tagging the transaction with the event uuid, should you need to pull out the raw event from Dynamo.

Oh, I see. You're right -- I misunderstood. I guess I have a sort of reflex/twitch going on about CQRS/ES vs. Datomic. Mea culpa. :)
Datomic is a great event log, but it is explicitly not meant to be used in high write volume environments, as it is limited by the serial transactor.
I think of Datomic as much more than a write log, since it includes the Datalog query language.

Also, how are you defining "high write volume"? Have you looked at metrics of how fast Datomic's serial transactor can be?

From the FAQ: "When is Datomic not a good fit? Datomic is not a good fit if you need unlimited write scalability, or have data with a high update churn rate (e.g. counters). And, at present, Datomic does not have support for BLOBs."

I agree Datomic is far more than a write log but in the context of this thread, the discussion was about write logs.

Interesting. Couldn't you achieve the same thing by aggregating data into an RDMS "when and where [you] can make useful sense of it"?