| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by benwilson-512 1699 days ago

We've got a few billion rows in TSDB, pretty happy with it so far. Our workload fits the OLTP workflow more than OLAP though, we're processing / analyzing individual data points from IoT devices as they come in, and then providing various visualizations. This tends to mean that we're doing lots of fetches to relatively small subsets of the data at a time, vs trying to compute summaries of large subsets.

Compression is seriously impressive, we see ~90% compression rate on our real world datasets. Having that data right next to our regular postgres tables and being able to operate on it all transactionally definitely simplifies our application logic.

Where I see a lot of folks run into issues with TimescaleDB is that it does require that your related data models hold on to relevant timestamps. If you want to query a hypertable efficiently, you always want to be able to specify the relevant time range so that it can ignore irrelevant chunks. This may mean that you need to put data_starts_at, data_ends_at columns on various other tables in your database to make sure you always know where you find your data. This is actually just fine though, because it also means you have an easy record of those min / max values on hand and don't need to hit the hypertable at all just to go "When did I last get data for this device".

1 comments

qorrect 1699 days ago

> Compression is seriously impressive

Does this effect your query performance ?

link

benwilson-512 1699 days ago

In practice we've seen it actually improve performance, because when fetching a data range for a device fewer actual rows have to be fetched from the disk. You pick certain columns (like device ID) that remain uncompressed and indexed for rapid querying, and then the actual value columns are compressed for a range of time.

link

qorrect 1699 days ago

Very cool thanks for sharing

> This may mean that you need to put data_starts_at, data_ends_at columns on various other tables in your database to make sure you always know where you find your data.

Do you have a link to docs for this ? Does this mean literally put a first column named (xstartx) and an end column (xendx) as the last column ? How do you then utilize it ?

Thanks so much!

link