| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rtuin 615 days ago

Glad you asked!

This system contains measurements and state of physical devices (time series). It’s designed for both heavy write and read, with slight emphasis on write. Each table is one type of device and contains 1 to 5 different measurements/states. But here’s the trick: because data is queried with minimum bucket size of 15minutes I figured we could just create a column for each measurement + quarter of the day (i.e. measure0000, measure0015), so that’s 100 columns for each measurement (96 quarter + 4 for DST), include the date in the key, et voila: excellent write performance (because it’s mainly UPDATE queries) and good read performance.

Okay, the queries to make sense of the data aren’t pretty, but can be generated.

I find it really cool how effective this is for time-series data without Postgres extensions (we’re on RDS).

4 comments

napsterbr 615 days ago

One interesting thing your team may want to look into (if you haven't already) is compression.

Of course there are a multitude of variables we don't have access from the outside, but Postgres only compresses data that is TOASTed, and based on your description of the table, the data is not being TOASTed (and therefore not being compressed).

Instead, if you could somehow pack your timeseries entries into an array, you would get the benefits of compression automatically.

Given your write performance requirements, using an array may be out-of-question (and you may get too much overhead from dead tuples) -- but who knows? Always a good idea to benchmark.

I actually considered mentioning this at the post but figured it was too long already and could be the material for a future one :)

link

rtuin 614 days ago

This is interesting! Turning the values into arrays is not suitable for our base-tables, but might be for the several materialization tables we run.

link

aidos 615 days ago

The updates wouldn’t be in place anyway because of the Postgres mvcc approach, right?

I can see the advantage in terms of just needing a single tuple for a reads. So a timestamp + value model would likely take twice as much heap space than your approach?

Given that you’re probably always just inserting new data you could use a brin index to get fast reads on the date ranges. Would be interesting to see it in action and play around to see the tradeoffs. The model you’ve settled on sounds like it would be a pain to query.

link

sgarland 615 days ago

> The updates wouldn’t be in place anyway because of the Postgres mvcc approach, right?

They might be if the columns being updated weren’t indexed [0], but since it sounds like at least one is, no, not in-place.

Though interestingly, your comment on BRIN indexes is quite relevant, as that’s the one type of index that HOT can still work with.

[0]: https://www.postgresql.org/docs/current/storage-hot.html

link

rtuin 614 days ago

Good observation. The updates are as much HOT-updates as possible. I wasn't familiar with BRIN-indexes before, so I have to look into that.

At first glance our solution follows a similar approach, let me elaborate:

- no index columns are updated ever, only inserted

- all tables are partitioned based on date (partition range is 1 month)

- for some tables there is another layer of partitioning (3 sub-partitions, based on one specific column)

- finding an appropriate fillfactor is important to improve the speed of UPDATE statements (HOT-updates)

- standard vacuum / auto vacuum settings work great for us so far.

- to improve ANALYZE performance, set column statistics of value-only columns to 0.

link

sevensor 615 days ago

Fascinating. How reliable is the measurement apparatus? Or to put it another way, how tolerant of missed measurements is this scheme? I’ve been in a similar situation in a past life and we took a different approach. I was never satisfied with how we ended up doing it though.

link

rtuin 614 days ago

I see your point. Some of our measurements are counters and interpolate missing data via certain rules. We store these interpolations in a separate table with exactly the same format and join the two tables directly when querying the data. Since the real value always takes precedence over the interpolated value this is just a COALESCE(realmeasure0000, interpolation0000) statement.

This is super fast when taking advantage of postgres' partition-wise joins.

link

sevensor 614 days ago

That’s really neat, thanks for explaining!

link

sgarland 615 days ago

I assume you looked at the various Postgres functions, like width_bucket()?

I’ve never ran a PG-based TSDB so I’m reluctant to pass judgment one way or the other, but as I hope you can understand, “we have hundreds of columns in a table” is alarming at first glance.

link

rtuin 614 days ago

The schema is organized by how we need to query it which is mostly by fetching all measurements for an entire day or month. Querying statistics on those metrics is not often needed, maybe a few times per week or day, so we do that in Snowflake if necessary. Or create a specialized streaming data flow when it the stats are queried more often.

link