Hacker News new | ask | show | jobs
by cammil 2053 days ago
I don't think the time that the forecast is for should be the main index. Rather the time that the forecast is created should be the main index. In general I think the time index is about what happened at a particular point in time.
1 comments

Interesting idea. How would you model the forecast time in this case? I'm not an expert but to my knowledge, you can only have one time index in InfluxDB, right?
I'm not sure about InfluxDB at all, sorry if I gave that impression. I have however designed and built data architecture for various forecasting software. "all data you perform analytics on is time series data" (from the article) rings so true with me.

However you need to consider that in this sense, it's about what is happening at any given moment in time. In your case, the forecasts are created at a point in time, and that "should" be their index speaking purely from a time-is-the-ultimate-index perspective.

Just some random questions that might help you answer your own questions: 1. Are there multiple forecasts made at the same time, for the same period in the future, but by different systems / algorithms / hyperparameters? Would you not want to keep multiple, and in that regard, what would be the "latest" forecast? 2. If your latest forecast is necessarily the most accurate, why would you need to keep previous versions in the same database?

None of this might be relevant to you, and maybe you'll only get the benefits of InfluxDB by using the forecast time as the index. But I thought I'd give you my thoughts just in case it helps you.

I think you're on the edge of describing bitemporal databases. You have a range which represents "time over which this fact is asserted to be true" and a second range for "time during which the assertion was recorded as valid". These typically get called "valid time" and "transaction time".

You can express forecasts this way. The valid time range is the range for which you assert a given forecast is going to be true. The transaction time range is the time during which you held that forecast to be the current forecast.

Using transaction time you can then reconstruct your state of belief from any point in time. Using valid time you can make assertions about any fact in ranges over the past, present or future.

I think Snodgrass's Developing Time-Oriented Database Applications in SQL[0] is still the best overall introduction, though the SQL used is fairly dated (ha). The relevant Wikipedia entry is OK too[1].

[0] https://www2.cs.arizona.edu/~rts/tdbbook.pdf

[1] https://en.wikipedia.org/wiki/Temporal_database