Hacker News new | ask | show | jobs
by silvester23 2053 days ago
Do you have any plans for supporting versioned time series? As in "show me the time series as it was on 2020-11-11-08T:00:00:00Z"? Most of our time series are forecasts that change over time and that would be super helpful.

Anyway, thanks for your work!

3 comments

Hi, one of the creators of IOx here. Unlike InfluxDB where you can only do efficient range based queries on one time value, in IOx you will be able to apply predicates efficiently to any column. That means you can have several columns for different times (e.g., time created, forecast time, expire time..).

You will be able to do efficient range queries on any of these. Most use-cases will work best when you partition your data into time chunks, which commonly would be the time you inserted the samples into the database (created time in your case I guess).

I don't think the time that the forecast is for should be the main index. Rather the time that the forecast is created should be the main index. In general I think the time index is about what happened at a particular point in time.
Interesting idea. How would you model the forecast time in this case? I'm not an expert but to my knowledge, you can only have one time index in InfluxDB, right?
I'm not sure about InfluxDB at all, sorry if I gave that impression. I have however designed and built data architecture for various forecasting software. "all data you perform analytics on is time series data" (from the article) rings so true with me.

However you need to consider that in this sense, it's about what is happening at any given moment in time. In your case, the forecasts are created at a point in time, and that "should" be their index speaking purely from a time-is-the-ultimate-index perspective.

Just some random questions that might help you answer your own questions: 1. Are there multiple forecasts made at the same time, for the same period in the future, but by different systems / algorithms / hyperparameters? Would you not want to keep multiple, and in that regard, what would be the "latest" forecast? 2. If your latest forecast is necessarily the most accurate, why would you need to keep previous versions in the same database?

None of this might be relevant to you, and maybe you'll only get the benefits of InfluxDB by using the forecast time as the index. But I thought I'd give you my thoughts just in case it helps you.

I think you're on the edge of describing bitemporal databases. You have a range which represents "time over which this fact is asserted to be true" and a second range for "time during which the assertion was recorded as valid". These typically get called "valid time" and "transaction time".

You can express forecasts this way. The valid time range is the range for which you assert a given forecast is going to be true. The transaction time range is the time during which you held that forecast to be the current forecast.

Using transaction time you can then reconstruct your state of belief from any point in time. Using valid time you can make assertions about any fact in ranges over the past, present or future.

I think Snodgrass's Developing Time-Oriented Database Applications in SQL[0] is still the best overall introduction, though the SQL used is fairly dated (ha). The relevant Wikipedia entry is OK too[1].

[0] https://www2.cs.arizona.edu/~rts/tdbbook.pdf

[1] https://en.wikipedia.org/wiki/Temporal_database

Can't you just have another time key?
Can I? That would be great. I thought InfluxDB did not support that but I might be wrong, I'm not really an expert.