Hacker News new | ask | show | jobs
by gianm 2705 days ago
Check out Druid [1], an open-source analytical database with tightly-coupled storage and processing engines designed for OLAP. In particular it implements a memory-mappable storage format, indexes, compression, late tuple materialization, and query engines that can operate directly on compressed data. There is a patch out to add vectorized processing as well, so you should expect to see that show up in a future release.

Its storage format and processing engine aren't designed to be embedded in the same way as RocksDB and SQLite are, but you certainly could if you wanted to, since the code is fairly modular. Or you could use it as a standalone service as it was designed to be used.

[1] http://druid.io/

1 comments

There's also Clickhouse [1] which seems to scale much better than Druid, and has similar architectural decisions to make it somewhat general as a columnar store for OLAP uses. Cloudflare wrote an article in the past where they compared Clickhouse and Druid and they chose Clickhouse because they could get similar performance on the same workload with 9 nodes in Clickhouse which would require hundreds for Druid. They built all of the DNS analytics at CloudFlare on Clickhouse [2].

Disclosure: I work at Percona, and we've seen a lot of our customers make use of Clickhouse and have begun some of our own services work around it in Consulting. It's now a primary database talked about at our conferences, and we post about it regularly. [3]

[1]: https://clickhouse.yandex/ [2]: https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-q... [3]: https://www.percona.com/blog/2018/10/01/clickhouse-two-years...

There is a very good article[1] by one of the Druid committers about Clickhouse/Drui/Pinot that goes into some details on why the Cloudflare tests turned out the way they did.

[1]:https://medium.com/@leventov/comparison-of-the-open-source-o...

That article is better than expected, and it matches my own experience with CH (it was a great match for our use-case, and some of those reasons are in the article; and also, we could have used an inverted index, would one have been available; surprisingly survived w/o it).
Does ClickHouse support fine grained data security (for example role A gives access only to tuples with column X==123)?
No [1]. ClickHouse is a fairly low-level tool. If you need that kind of thing, you build an ACL-aware app on top of it.

[1] https://clickhouse.yandex/docs/en/operations/access_rights/