Hacker News new | ask | show | jobs
by zX41ZdbW 1351 days ago
One write unit is around 100..200 INSERT queries.

If you are doing INSERT in batches with one million rows, it will give

    SELECT formatReadableQuantity(1000000 * 100 / 0.0125)
    
    8.00 billion
inserted rows per dollar. Pretty good, IMO.

If you are doing millions of INSERT queries with one record, without "async_insert" setting, it will cost much more.

That's why we have "write units" instead of just counting inserts.

1 comments

More helpful would be answers to my questions at https://news.ycombinator.com/item?id=33081502 - async_insert is a relatively new feature, we're still using buffer tables for example - but also most of our "client" inserts are actually onto multi-MV-attached null engines. Those MVs are also often doing some pre-aggregation before hitting our MTs as well. So we might insert a million rows, but the MV aggregates that down into 50k, but then that gets inserted into five persistent tables, each of which has its own sharding/partitioning so that blows up to 200k or something "rows" again. (And at some point those inserts are also going to get compacted into stuff inserted previously / concurrently by the MT itself.)

As I've said several times in this thread, I understand why you don't count inserts or rows. What I don't understand is what unit a WU does actually correspond to. In particular I don't understand its relation to e.g. parts or blocks, which are the units one would focus on optimizing self-hosted offerings.

I think optimizations that you focus on for self-hosted ClickHouse are the same as for Cloud. In self-hosted it helps to improve your throughput/capacity with fixed allocated resources. In cloud it directly affects cost.

For those complex pipelines you may find more useful to run tests during trial. Data distribution, partitioning and so on can change actual cost significantly so estimates can be too pessimistic or optimistic

> For those complex pipelines you may find more useful to run tests during trial.

Right, that's exactly what I don't want to deal with. Unless I have even just a ballpark estimate of complex pipelines both before I commit to any sales crap and afterwards when we're designing new pipelines, it's just not an option for us at all. I have no clue if it's going to cost us $10, $100, or $10000.