Hacker News new | ask | show | jobs
by JulianWasTaken 1145 days ago
I don't doubt your story of course, and I love Postgres, but comparing apples to oranges no?

Datomic's killer feature is time travel.

Did you simply not use that feature once you moved off Datomic (and if so why'd you pick Datomic in the first place)? Or are you using Postgres using some extension to add in?

3 comments

We implemented it in Postgres with 'created_at' and 'deleted_at' columns on everything and filtering to make sure that the object 'exists' at the time the query is concerned with. Changes in relationships between objects are modeled as join tables with a boolean indicating whether the relationship is made or broken and at what time.

Our data model is not large and we had a very complete test suite already, so it was easy to produce another implementation backed by postgres, RAM, etc.

Yeah, it seems you could be able to substitute thoughtful schema design avoiding updates/deletes for time-travel as a feature.

I wonder if anyone has made a collection of reference examples implemented this way (and in general think that a substantial compendium good examples of DB schema and thinking behind them could be worthwhile).

It’s called a slowly changing dimension. In this example, it’s a type-2.

https://en.m.wikipedia.org/wiki/Slowly_changing_dimension

I'm moderately confident you could mechanically transform a time-oblivious schema into a history-preserving one, and then write a view on top of it which gave a slice at a particular time. Moderately.
That is essentially what MVCC does.
Yes, although AFAIK those hidden MVCC columns (xmin, xmax?) aren't very usable from an application standpoint -- the obsoleted rows only hang around until the next VACUUM, right?

I realize you're not claiming those columns are useful from an application perspective. Just curious to know if I'm wrong and they are useful.

Because as I understand it, the selling point of Datomic is their audit trail functionality and that is admittedly a bit onerous to implement in a RBDMS. Even though I feel like every project needs/requires that eventually.

I meant MVCC is the proof that you can automate the transform of a schema into a versioned schema. How and if the DBMS exposes that is another concern.

The garbage collection / VACUUM part of an MVCC system is the harder part, saving all versions and querying a point in time is the easy one.

Oracle lets you use the MVCC data to query past states of the database, called "flashback":

https://docs.oracle.com/en/database/oracle/oracle-database/2...

You can configure how long the old data is kept:

https://docs.oracle.com/en/database/oracle/oracle-database/2...

Worked examples:

http://www.dba-oracle.com/t_rman_149_flasbback_query.htm

That is an extremely good point.
Snodgrass wrote a whole book on that topic: Developing Time-Oriented Database Applications in SQL (1999).

It is available as PDF on his publications page:

https://www2.cs.arizona.edu/~rts/publications.html

Maybe search around on bitemporal database table modeling.
Ive built a couple systems that would have been datomic’s bread and butter.

Each time the company was more comfortable with mainstream dbs, so we ended going with something like you’re talking about, built on top of a db. A couple of the projects were because a mainstream dbs wouldn’t scale.

The systems definitely worked, but it was also a lot of implementation complexity on an other wise simple business prop: “store this data as facts”

https://www.postgresql.org/docs/11/contrib-spi.html#id-1.11.... discusses a model for implementing time travel in Postgres <12 using SPI. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit... discusses why it was removed in Postgres 12 - it seems logical that it's more maintainable to implement in plpgsql, though as far as I can tell there aren't off-the-shelf implementations of this.

We use https://django-simple-history.readthedocs.io/en/latest/ (with some custom tooling for diff generation) for audit logs and resettability, and while you can't move an entire set of tables back in time simultaneously, it's usually sufficient for understanding data history.

Datomic's 'time travel' is an audit feature, not something for your application/business logic to depend on. Performance reasons make it impractical, unless you only have like 10 users and very little data.
That's certainly not how it sells and markets itself.

The first feature on benefits (and the only reason I've ever heard Datomic brought up and/or considered it myself for production workflows) is using that stuff in application workflows: https://docs.datomic.com/pro/time/filters.html#history

Could be you're saying it in fact doesn't work well performance-wise, that'd (surprise me but) certainly explain why it's not more popular -- but I think it's clear it wants you to use this as an application feature.

Welcome to sales tactics ;)

Datomic is great but as another commenter said, is good for "small-ish backoffice systems that never has to be web scale". You almost probably can rely on querying history for internal applications. I think their primary market was for companies to use it internally but they never made this clear.

Ironically, Hickey fired the one marketer they hired for Datomic.

He lucked out when a unicorn went all in on it. Word around Cognitect was, Datomic was barely breaking even.

> "small-ish backoffice systems that never has to be web scale". Doesn't production use of Datomic by Nubank and Netflix (to mention just two examples) belie this assertion?
Alternatively, Datomic wasn't performing up to snuff, and they found it cheaper to buy Cognitect than do a DB migration :D
Do those companies specify what they use it for? They probably have their own internal "small-ish backoffice systems".
Nubank is one thing, but for Netflix, just like for any big company 10000 DB technologies are probably in use at the same time.

And 9996 of them are used for stuff like the internal HR DB or other minor projects.