Hacker News new | ask | show | jobs
by aidos 620 days ago
The updates wouldn’t be in place anyway because of the Postgres mvcc approach, right?

I can see the advantage in terms of just needing a single tuple for a reads. So a timestamp + value model would likely take twice as much heap space than your approach?

Given that you’re probably always just inserting new data you could use a brin index to get fast reads on the date ranges. Would be interesting to see it in action and play around to see the tradeoffs. The model you’ve settled on sounds like it would be a pain to query.

2 comments

> The updates wouldn’t be in place anyway because of the Postgres mvcc approach, right?

They might be if the columns being updated weren’t indexed [0], but since it sounds like at least one is, no, not in-place.

Though interestingly, your comment on BRIN indexes is quite relevant, as that’s the one type of index that HOT can still work with.

[0]: https://www.postgresql.org/docs/current/storage-hot.html

Good observation. The updates are as much HOT-updates as possible. I wasn't familiar with BRIN-indexes before, so I have to look into that.

At first glance our solution follows a similar approach, let me elaborate:

- no index columns are updated ever, only inserted

- all tables are partitioned based on date (partition range is 1 month)

- for some tables there is another layer of partitioning (3 sub-partitions, based on one specific column)

- finding an appropriate fillfactor is important to improve the speed of UPDATE statements (HOT-updates)

- standard vacuum / auto vacuum settings work great for us so far.

- to improve ANALYZE performance, set column statistics of value-only columns to 0.