Hacker News new | ask | show | jobs
by sjwright 2171 days ago
I've been begging for exactly this for quite some time. Because of the way I use databases, I've always been bewildered why this wasn't a core part of SQL from the very beginning.

From what I'm reading there's still a lot to be fleshed out to be maximally useful to me, but even in its current state I could imagine using this.

— I'd like to have a field property that limits stored values to a single version and thus is automatically cleared whenever the row is updated. This would be useful for inlining change annotations, and for associating a user_id to specific changes.

— I'd like to be able to arbitrarily select the n-1 value of fields regardless of their time period. E.g.

  select username, previous(username)
  from users
— When viewing a specific version, I'd like to know whether a field's value was supplied in that revision. That's distinct from if the field was changed. I want to know if the value was supplied—even if it was identical to the previous value.

— This might be possible already (it's hard to tell) but I'd like to be able to query/join on any revision. For example I might want to ask the question "show me all products that james has ever modified". That could then get more specific, e.g. "show me all products where james changed the price".

2 comments

> I've always been bewildered why this wasn't a core part of SQL from the very beginning.

It's a long and messy history (no pun intended), but essentially it was rarely practical to consider retaining database history for the first few decades of SQL, due to physical storage costs & limitations. Snodgrass and Jensen proposed initial bitemporal extensions in the 90s and lot of research was done subsequently, but most vendors didn't make their move until the 2011 standard was formed (Oracle Flashback being the most notable exception). Unfortunately the rollout of the 2011 temporal standard has been underwhelming across the board, as each vendor ended up implementing something subtly different, which I think has massively hindered adoption. Since then I would guess that "immutability" has been the largest driving force behind the resurgence of interest.

> it was rarely practical to consider retaining database history for the first few decades of SQL

That does make sense from a historical perspective and I don't doubt that's why. But still I find it unsatisfying because any competent database schema will always retain the history that needs to be retained. If you don't have the storage capacity, you choose to not store so much history. If you don't have native concepts for storing history, you kludge it yourself.

Whether you have native temporal support or have to kludge a DIY solution in the schema, the data you need to store gets stored.

My frustration is that I feel that temporal concepts should have been deeply native to SQL right to its core. History should have been as fundamental to database design as columns and rows. It should be a thing you turn off when you don't want it, not a thing you turn on when you do.

> temporal concepts should have been deeply native to SQL right to its core

Oh absolutely. I think the original intuition by Snodgrass et al. in TSQL2 to model temporality outside of the actual relational structure was a more promising direction, otherwise the complexity of composing joins across many tables, each with independent temporal versioning, seems rather overwhelming (note that the article doesn't discuss joins at all). Schema migration is another hairy topic that the 2011 standard barely addresses.

Modelling temporality outside of the database is what we've been pushing ahead with on Crux, which provides Datalog as the native query language but will imminently also support a simple bitemporal flavour of SQL for point-in-time querying (using Apache Calcite), where application & system time are defined by a single "as-of" prefix at the beginning of the statement that applies across the entire query: https://github.com/juxt/crux/tree/master/crux-sql#bitemporal... (queries over ranges and periods are currently out-of-scope)

Joining on self by a different time range from the current is probably doable?