Hacker News new | ask | show | jobs
by gtrubetskoy 4102 days ago
Thanks Paul! So you're saying it's all a SMOP :)

Another thing that I think might be a critical (or at least interesting) characteristic is back-filling optimization, i.e. when you need to load a trillion data points of historical data - this y/t explains it pretty well and talks about how OpenTSDB addresses it: https://www.youtube.com/watch?v=SgD3RD2Shg4

Anyhow - keep up the good work, I very much believe that in the next couple of years "Time Series" is going to become a resume-must-include buzzword :)

1 comments

Cool, I'll have to take a look at that talk. We've had people ask about backfilling large amounts of data so it's something we'll have to figure out.
Another thing I was curious about is why not do all the clustering/distributed stuff at the db level, i.e. have some sort of a distributed BoltDB-like/Raft as a separate layer or even entirely separate project, and then InfluxDB would be a much thinner/simpler thing. I think that in general the approach of OpenTSDB and similar things is right, it's just that HBase/Hadoop is a such a pain to set up and maintain (and so is Cassandra, if perhaps a little less).
One of the key goals of the project is to be able to aggregate and downsample from raw high precision data. That means we want a framework in which we ship the code to where the data lives, not the other way around.

The abstractions I've seen that have the database layer and then some services on top all miss this. They transport all of the raw data over the network and then run the computations and return the summary ticks back to the user.

Our framework lets us compute the summary ticks locally and send only those back (is many cases, but not all).