| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sa46 969 days ago

I love temporal tables. I built an implementation in Postgres [1] in a few hundred lines of PLPGSQL. We've run the implementation for several years across a few hundred tables for an ERP product.

The core ideas of the implementation:

- The current table is a typical Postgres table. The past table's primary key is (curr_tbl.pk, asr), where the asr column is the valid period of the row using a domain on tstzrange.

- Create statement-based triggers on the current table to copy the after-image of modified rows to the past table. Statement-based triggers amortize function call overhead compared to row-base triggers (I think depesz compared the performance and found about a 10x difference, but I can't find the link).

- Copying the row after-image on update and delete to the past table is advantageous because inserting a new row has no overhead. The disadvantage is that it's harder to figure out when a row was deleted (requires checking for gaps in the past table).

Some pointers if you dive into uni-temporal tables in Postgres:

- Separate the past table and current table into different schemas. Most database users shouldn't modify the past tables. It's easier to grant permission by schema and it makes autocomplete nicer. By convention, we use erp.invoice and erppast.invoice.

- Use a domain type instead of tstzrange to blunt the sharp edges of tstzrange. See the temporal.period domain type in [1].

- Resist the temptation to query against the past table. Temporal joins are fabulously complex without first-class database support (like Oracle's AS OF).

- Optimize for the typical case: querying the current table. Our first temporal implementation used table inheritance consisting of three tables: a parent table, the current table, and a past table. Theoretically, the parent table lets you query data transparently across the parent and child tables. In practice, we didn't use the parent query capability at all. Having a parent table made code review harder by requiring the reviewer to check that the code queried the current table and not the parent table. It's easy enough and rare enough to query all versions of a row by using:

    SELECT * FROM current_table
    UNION ALL
    SELECT * FROM past_table

- Track the current and past tables in a metadata table. Use tests to check that columns don't drift between the current and past tables. See misaligned_cols.sql in [1] for an example.

- Unfortunately, you can't use declarative partitioning because the primary key of the past table differs from the current table. The past table must include the valid period column to differentiate it from other versions of the same row.

[1]: https://gist.github.com/jschaf/cd1c1a3c2a5897282929ee5e16f94...

1 comments

robocat 969 days ago

How do you manage schema changes?

link

sa46 969 days ago

Table renames don’t break anything.

Changing columns must be done to both tables in a transaction. Unit tests verify that the table columns match.

link