| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by benwilson-512 2428 days ago

Hey! We're looking to evaluate TimescaleDB for a logistics IoT scenario. Some of the data that enters our system comes from connected devices where recorded_at and inserted_at columns are basically the same. Some data however is sourced from dataloggers that may record for months before the data arrives at our system.

With TimescaleDB, would I use the recorded_at or inserted_at column for the hypertable?

Does this change if data for an individual sensor can sometimes arrive out of order? If the sensor malfunctions and the data contains timestamps in the far past or the far future does this cause issues with TimescaleDB?

What we've done in postgres so far is have the tables with data generally structured around the recorded_at column because most analysis wants to look at the data "in order" . to generate reports, graphs, etc. Each data row also contains a "payload_id" relating it to a "payloads" table which helps group data by when it actually hit the system. Data processing has generally been built around the payloads and then query any additional data in recorded_at order on the main data tables if we need to look back or forward in time.

1 comments

RobAtticus 2428 days ago

For choosing the column, you'll usually want to think about what your queries will be using. It sounds like `recorded_at` is probably more likely to be useful since that's when the data "occurred," but again it depends on your expected query load.

Out of order data should be handled fine by TimescaleDB -- if you do have data that is far in the future or in the past, you may get stray chunks to hold those, but it's not going to create all the intermediate chunks or anything that might be undesirable. You can later correct those fields by deleting and reinserting the record with a corrected timestamp.

link

jnordwick 2428 days ago

Sounds like he's describing a bitemporal database (although not in the canonical form usually associated with them).

I've looked into timescaledb for this, and it doesn't support them.

link

refset 2428 days ago

I agree. Bitemporal databases can natively handle late-arriving data in these kinds of upstream timestamp integration scenarios.

However, the intersection of bitemporal indexes and columnar time-series queries seems important and yet I haven't seen anything that looks like it might offer both, possibly asides from kdb+ and SAP HANA.

Disclosure: I work on https://github.com/juxt/crux (which is optimised for bitemporal graph joins and doesn't currently employ columnar indexes)

link