Hacker News new | ask | show | jobs
by manigandham 2272 days ago
Hey Gian, I'm familiar with Druid since its start at metamarkets (and a client of that company). I've been following Imply and you guys have done great work at making Druid a lot better over the years.

I guess I should've stated relational columnstore to describe the others. Vertica has S3/remote storage interfaces similar to Historicals and all vendors are adding indexing to columnstore segments beyond partition/zone maps for fast seeks. MemSQL is the most advanced with in-memory tables to augment the disk-based columnstores.

The improved SQL support will help and the overall design of Druid makes sense, but I have to stand by the fact that I find it tough to recommend over the alternatives now. If everything's converging on similar functionality, what would you say is the roadmap for Druid's future advantage?

1 comments

Those are good questions.

IMO Druid is most well-differentiated if you want to power an online, real-time, high-concurrency analytical application at scale. It is the use case Druid was originally designed for and still the one where the project shines the brightest. The reason mostly isn't related to things that database people usually talk about (storage format, indexes, etc). That stuff is important but isn't a major differentiator between systems in today's world. The reason is more related to the pieces in between servers, like locking, replication, fault tolerance, data partitioning and balancing, and resource management. Druid's approach to these things is relatively unique and gives it characteristics that allow it to do well at powering these sorts of apps at scale. I think it will remain an important advantage of Druid over other systems. Maybe one day the details would make a good blog post :)

As far as the roadmap goes, most of the work we're doing to make Druid better falls into two categories: first, stuff that makes it even better at this core analytical app engine use case; second, stuff that better supports new use cases, like the work on building out SQL. They are both important so usually each release has a bit of both.