| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vjerancrnjak 422 days ago

It's quite amazing how a db like this shows that all of those row-based dbs are doing something wrong, they can't even approach these speeds with btree index structures. I know they like transactions more than Clickhouse, but it's just amazing to see how fast modern machines are, billions of rows per second.

I'm pretty sure they did not even bother to properly compress the dataset, with some tweaking, could have probably been much smaller than 30GBs. The speed shows that reading the data is slower than decompressing it.

Reminds me of that Cloudflare article where they had a similar idea about encryption being free (slower to read than to decrypt) and finding a bug, that when fixed, materialized this behavior.

The compute engine (chdb) is a wonder to use.

1 comments

apavlo 422 days ago

> It's quite amazing how a db like this shows that all of those row-based dbs are doing something wrong

They're not "doing something wrong". They are designed differently for different target workloads.

Row-based -> OLTP -> "Fetch the entire records from order table where user_id = XYZ"

Column-based -> OLAP -> "Compute the total amount of orders from the order table grouped by month/year"

link

vjerancrnjak 422 days ago

Filtering by user id would also be trivially fast.

It’s transactions mostly that make things slow. Like various isolation levels, failures if stale data was read in a transaction etc.

I understand the difference, just a shame there’s nothing close to read or write rate , even on an index structure that has a copy of the columns.

I’m aware that similar partitioning is available and that improves write and read rate but not to these magnitudes .

link

FridgeSeal 422 days ago

Some of the “new SQL” hybrid (HTAP, hybrid transaction-analytical processing) databases might be of interest to you. TiDB is the main example off the top of my head.

link

beoberha 422 days ago

look at who you’re arguing with ;)

link