|
Hey Mani. Druid committer here. It actually is a column store! The project makes a big deal about its ability to do indexes and pre-aggregation because those are important capabilities and, while not unique, are also not universally supported by every column store out there. So they are interesting differentiators. But architecturally they are really just extra icing on the cake. Personally I see stuff like Druid, MemSQL, Clickhouse, Redshift, BigQuery, and Snowflake as technological siblings in the space. These systems are all evolving rapidly too (well, the healthy ones are anyway) so it's definitely a good time to be an analytical database enthusiast. With regard to the operational complexity, that's an interesting point. It shows up in two main ways, I think -- the multi-process architecture and usage of external deep storage. On huge clusters, which is what Druid was designed for, the idea is that explicitly separating components in this way gives you three benefits: they don't interfere with each other (spikes in ingestion load won't interfere with ability to query historical data), you can scale each one individually, and it makes most components "disposable" (as long as your storage is reliable, the other Druid components can be blown away and recreated without losing any data). It helps when you're trying to run a big cluster in a stateless / containerized environment. But these aspects are less good on small clusters or single servers, where it just feels like a bunch of overhead. So we're currently working on simplifying some of this for people that aren't running huge clusters. We're also expanding SQL support rapidly. Almost every release adds additional SQL capabilities. The next release is a big one, adding JOIN and GROUPING SETS operators. The project's goal is to support it all before too long -- up next after this release will likely be analytic functions. If you're interested in checking out the community, we do meetups pretty often (all virtual now, though, due to COVID-19). We're also planning our first user conference later in the year @ https://druidsummit.org/. |
I guess I should've stated relational columnstore to describe the others. Vertica has S3/remote storage interfaces similar to Historicals and all vendors are adding indexing to columnstore segments beyond partition/zone maps for fast seeks. MemSQL is the most advanced with in-memory tables to augment the disk-based columnstores.
The improved SQL support will help and the overall design of Druid makes sense, but I have to stand by the fact that I find it tough to recommend over the alternatives now. If everything's converging on similar functionality, what would you say is the roadmap for Druid's future advantage?