Hacker News new | ask | show | jobs
by pauldix 4102 days ago
Hi Grisha, I saw that post, thanks for writing it! The coming features you're talking about are the work we're focused on for finishing this release. The three you mention should drop in an RC within two weeks.

The distributed queries part isn't a large amount of work beccause of how we've designed things. Under the covers the query engine already represents each query as a MapReduce job to be run.

For cluster expansion, work is starting on that today. Again it's just a matter of wiring some things up. Node replacement is also starting today.

We may miss the March goal but it won't be by anything close to 3 months. Glad you're paying attention to the project though :)

For the Foundation problem, I thought they were never open source. Just free for 5 nodes or less, no?

I think the key to avoiding this fate is to build an active community of contributors outside the company. Luckily we have people submitting PRs every week. We'll be trying to document more of the code and make it easier for outsiders to get involved as we go along.

That way if the worst happens, at least the community can fork and keep the project going forward. I'd love nothing more than for Influx to become bigger than this company.

1 comments

Thanks Paul! So you're saying it's all a SMOP :)

Another thing that I think might be a critical (or at least interesting) characteristic is back-filling optimization, i.e. when you need to load a trillion data points of historical data - this y/t explains it pretty well and talks about how OpenTSDB addresses it: https://www.youtube.com/watch?v=SgD3RD2Shg4

Anyhow - keep up the good work, I very much believe that in the next couple of years "Time Series" is going to become a resume-must-include buzzword :)

Cool, I'll have to take a look at that talk. We've had people ask about backfilling large amounts of data so it's something we'll have to figure out.
Another thing I was curious about is why not do all the clustering/distributed stuff at the db level, i.e. have some sort of a distributed BoltDB-like/Raft as a separate layer or even entirely separate project, and then InfluxDB would be a much thinner/simpler thing. I think that in general the approach of OpenTSDB and similar things is right, it's just that HBase/Hadoop is a such a pain to set up and maintain (and so is Cassandra, if perhaps a little less).
One of the key goals of the project is to be able to aggregate and downsample from raw high precision data. That means we want a framework in which we ship the code to where the data lives, not the other way around.

The abstractions I've seen that have the database layer and then some services on top all miss this. They transport all of the raw data over the network and then run the computations and return the summary ticks back to the user.

Our framework lets us compute the summary ticks locally and send only those back (is many cases, but not all).