Hacker News new | ask | show | jobs
by tylerhannan 1353 days ago
It's Tyler from ClickHouse.

Check out the response below that has a reference to some of our billing FAQs.

2 comments

It doesn't mention anything about what a write unit is, except to say you can reduce write units by batching inserts (that part I guessed already.)

There's no way to think about what an actual write unit means. You could measure the costs on a sample workload, but that's far from ideal. Some transparency here would be nice.

I understand the answer is complicated, based on hairy implementation details, and subject to change. Give me the complexity and let me interpret it according to my needs.

Absolutely.

Working on updating the FAQ and tooltips now and sharing your feedback. <3

Right, that link covers read units which is also what I expected - essentially the number of files I have to touch - but I still have no clue about write units.

Is one block on one non-partitioned non-distributed table one write unit? What about one insert that's two blocks on such a table? What about one block on a null engine with two MVs listening to insert into two non-partitioned non-distributed tables? What if the table is a replacing mergetree, do I incur WUs for compactions? etc.

My worry is that it is essentially 1 WU = 1 new part file, which I understand makes sense to bill on but is tremendously intransparent for users - at least I have no clue how often we roll new part files, instead I'm focused on total network and disk i/o performance on one side and client query latency on the other.

I may assure you that 1WU is not 1 part. Not even close. You can check it using trial credits with your data.

For example, I just checked that uploading 1.1GB example table(cell_towers with 14 columns) cost me 0.38 write units.

Then I'm even more confused, because the pricing page clearly says write operations consume at least one WU.
With analytical column store DBs the standard is to do massive batches writes of thousands to millions of records at a time, vs. inserting individual records. Inserting individual records is basically always crazy inefficient with column stores. So a single write is generally for thousands to millions of records.
Buddy, if you look just a couple posts up you'll see me comment on how ClickHouse's actual disk format works. You don't need to explain batching to me.

Nonetheless you can't insert a-whole-file-and-just-that-file in less than one write.

Where does it say that? The pricing page says on "Writes" in the info tooltip: "Each write operation (INSERT, DELETE, etc) consumes write units depending on the number of rows, columns, and partitions it writes to."

This doesn't imply to me that each individual INSERT costs 1 WU, but that it could be fractional. I guess it depends on how you read it?

The tooltip has been changed since my comment was posted; it's now not incorrect, but it still doesn't really tell me more useful information.

(See https://news.ycombinator.com/item?id=33081099 for the original wording.)