| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hazaskull 1034 days ago
	Not Postgres-based (but wire- and mostly syntax-compatible): cockroachDB using column families is much like a columnar MPP. Yugabyte is PG-based and MPP but not columnar.

3 comments

refset 1034 days ago

The presence and use of column families is only half of the puzzle - it doesn't strictly imply that the execution engine is capable of working in a vectorized columnar style (which is necessary for competitive OLAP).

link

hazaskull 1034 days ago

Was unable to edit my previous. It does use vectorization: https://www.cockroachlabs.com/docs/stable/vectorized-executi...

link

refset 1025 days ago

Thanks for sharing that - TIL! These blog posts elaborate with more detail:

https://www.cockroachlabs.com/blog/vectorized-hash-joiner/

https://www.cockroachlabs.com/blog/vectorizing-the-merge-joi...

...it seems the distinction here is that the vectorization is only present in the execution layer and not the storage layer also. I would guess that from a storage perspective, even with column families in play, everything is being streamed out of sorted a LSM engine regardless. So there isn't additionally some highly-tuned buffer pool serving up batches of compressed column files etc.

link

hazaskull 1034 days ago

Indeed. As I commented alsewhere this is just about the general design. It is not targeting OLAP in this case (even though I do believe cockroach employs vectorization for reads)

link

riku_iki 1034 days ago

> cockroachDB using column families is much like a columnar MPP.

I am wondering why they are saying it is not for OLAP workload..

link

hazaskull 1034 days ago

They don't optimize for it and I suppose the data distribution is primarily aimed at parallel OLTP rather than OLAP. Just wanted to mention that design-wise it is similar but that's indeed not all there is to it. I'd be hesitant to store large volumes of data on a single PG instance; don't see how a single-writer, filesystem-based database is suitable at all for data that is large enough to warrant columnar storage

link

riku_iki 1034 days ago

so, what would be your db choice for OLAP?

link

esafak 1033 days ago

You can also go HTAP with TiDB which has TiKV for OLTP and TiFlash (Raft-based columnar replicas) for OLAP.

link

riku_iki 1033 days ago

I am more interested in actual OLAP than HTAP, and don't see strong OSS OLAP offering on the market right now, my rants in previous discussion: https://news.ycombinator.com/item?id=36992039

But I should look at TiDB, they looks like interesting and relatively mature project.

link

esafak 1033 days ago

https://www.starrocks.io/ is on my shortlist for OLAP

link

hazaskull 1034 days ago

No expert but I'd say you'd rather be looking at bigquery, Redshift, Clickhouse, Snowflake, etc.

link

ddorian43 1033 days ago

Note column families has nothing to do with columnar.

Another example is cassandra is not column oriented.

link

hazaskull 1033 days ago

Thank you for the correction. Indeed it is not entirely the same thing. Though I'd expect that at least the benefit of not having to read columns that aren't in the family would still help (haven't tried in earnest). I suppose compression is not an option though.

link