| HN Mirror

Sure, these tests were not using really large batch sizes because of the other benchmarks we were trying to replicate (but with more detail). Honestly, for this single instance setup, we saw improvement in CH when we went from (say) 5k, 10k, or 20k batches. But it was a few percentage points at a time, not a magnitude different. I'm sure things changes with a cluster setup too, that just wasn't the focus of this post.

Interestingly, we were just testing a multi-node TimescaleDB cluster the other day and found that 75k rows/batch was the optimal size as nodes increased.

So you're completely correct. I tried to be very clear that we were not intentionally "cooking the books" and there's surely other optimizations we could have made. Most of the suggestions so far, however, require further setup of CH features that haven't been used in other benchmarks, so we tried to over communicate our strategy and process.

We also fully acknowledged in the post that an siloed "insert", wait, then "query" test is not real world. But, it's the current way TSBS has been used and other DB engines have come along and used the methodology for now. Maybe that process will change in time to come with other contributions.

BTW, we'll discuss some of this next week during the live-stream and the video will be available after.