Hacker News new | ask | show | jobs
by OldHand2018 2777 days ago
It's pretty ridiculous that "Time-Series Database" has come to mean ingesting massive amounts of streaming data. They've been around a long time and have many use cases.

They're a great way to store data efficiently, accessing specific data if you know the time range you are looking for is very fast and simple, and you can roll your own in a few dozen lines of C if that's what you want to do. If that's all you need, why not?

1 comments

That may be a perfectly good solution if you have a very static infrastructure and narrow use case.

As a thought exercise, for the most trivial solution, you could create a single append only flat file. This may work well for writes, but what happens when you want to read the datapoints for only a single series in time order? This would result in an expensive scan over the whole file. An improvement could be to create a file per series, but this becomes problematic when writing many small datapoints across each different file. The problem worsens in the case of a dynamic containerised infrastructure which produce a unique number of timeseries over very short intervals, which was the catalyst for the development of Prometheus TSDB v2, as the prior version stored a file per timeseries.

As the post states, there is a balance between the read and write pattern - achieving that with a few lines of C for a general purpose case is a difficult task, if not impossible.

To be clear, my post was to state that there are many use cases for time-series databases and bemoan the fact that most current development centers around a specific use case. That is in fact what I wrote.

I have a hard time believing that "a dynamic containerised infrastructure which produce a unique number of timeseries over very short intervals" is the superset of all time-series use cases, but perhaps it is so.