It doesn't mention anything about what a write unit is, except to say you can reduce write units by batching inserts (that part I guessed already.)
There's no way to think about what an actual write unit means. You could measure the costs on a sample workload, but that's far from ideal. Some transparency here would be nice.
I understand the answer is complicated, based on hairy implementation details, and subject to change. Give me the complexity and let me interpret it according to my needs.
Right, that link covers read units which is also what I expected - essentially the number of files I have to touch - but I still have no clue about write units.
Is one block on one non-partitioned non-distributed table one write unit? What about one insert that's two blocks on such a table? What about one block on a null engine with two MVs listening to insert into two non-partitioned non-distributed tables? What if the table is a replacing mergetree, do I incur WUs for compactions? etc.
My worry is that it is essentially 1 WU = 1 new part file, which I understand makes sense to bill on but is tremendously intransparent for users - at least I have no clue how often we roll new part files, instead I'm focused on total network and disk i/o performance on one side and client query latency on the other.
With analytical column store DBs the standard is to do massive batches writes of thousands to millions of records at a time, vs. inserting individual records. Inserting individual records is basically always crazy inefficient with column stores. So a single write is generally for thousands to millions of records.
Buddy, if you look just a couple posts up you'll see me comment on how ClickHouse's actual disk format works. You don't need to explain batching to me.
Nonetheless you can't insert a-whole-file-and-just-that-file in less than one write.
Where does it say that? The pricing page says on "Writes" in the info tooltip: "Each write operation (INSERT, DELETE, etc) consumes write units depending on the number of rows, columns, and partitions it writes to."
This doesn't imply to me that each individual INSERT costs 1 WU, but that it could be fractional. I guess it depends on how you read it?
There's no way to think about what an actual write unit means. You could measure the costs on a sample workload, but that's far from ideal. Some transparency here would be nice.
I understand the answer is complicated, based on hairy implementation details, and subject to change. Give me the complexity and let me interpret it according to my needs.