Hacker News new | ask | show | jobs
by hazaskull 1034 days ago
Not Postgres-based (but wire- and mostly syntax-compatible): cockroachDB using column families is much like a columnar MPP. Yugabyte is PG-based and MPP but not columnar.
3 comments

The presence and use of column families is only half of the puzzle - it doesn't strictly imply that the execution engine is capable of working in a vectorized columnar style (which is necessary for competitive OLAP).
Was unable to edit my previous. It does use vectorization: https://www.cockroachlabs.com/docs/stable/vectorized-executi...
Thanks for sharing that - TIL! These blog posts elaborate with more detail:

https://www.cockroachlabs.com/blog/vectorized-hash-joiner/

https://www.cockroachlabs.com/blog/vectorizing-the-merge-joi...

...it seems the distinction here is that the vectorization is only present in the execution layer and not the storage layer also. I would guess that from a storage perspective, even with column families in play, everything is being streamed out of sorted a LSM engine regardless. So there isn't additionally some highly-tuned buffer pool serving up batches of compressed column files etc.

Indeed. As I commented alsewhere this is just about the general design. It is not targeting OLAP in this case (even though I do believe cockroach employs vectorization for reads)
> cockroachDB using column families is much like a columnar MPP.

I am wondering why they are saying it is not for OLAP workload..

They don't optimize for it and I suppose the data distribution is primarily aimed at parallel OLTP rather than OLAP. Just wanted to mention that design-wise it is similar but that's indeed not all there is to it. I'd be hesitant to store large volumes of data on a single PG instance; don't see how a single-writer, filesystem-based database is suitable at all for data that is large enough to warrant columnar storage
so, what would be your db choice for OLAP?
You can also go HTAP with TiDB which has TiKV for OLTP and TiFlash (Raft-based columnar replicas) for OLAP.
I am more interested in actual OLAP than HTAP, and don't see strong OSS OLAP offering on the market right now, my rants in previous discussion: https://news.ycombinator.com/item?id=36992039

But I should look at TiDB, they looks like interesting and relatively mature project.

https://www.starrocks.io/ is on my shortlist for OLAP
No expert but I'd say you'd rather be looking at bigquery, Redshift, Clickhouse, Snowflake, etc.
Note column families has nothing to do with columnar.

Another example is cassandra is not column oriented.

Thank you for the correction. Indeed it is not entirely the same thing. Though I'd expect that at least the benefit of not having to read columns that aren't in the family would still help (haven't tried in earnest). I suppose compression is not an option though.