Hacker News new | ask | show | jobs
by AdriaanvRossum 309 days ago
> TL;DR: PostgreSQL wraps every statement in a fully transactional context; ClickHouse doesn’t. That means these results aren’t a perfect measure of transaction performance. However, they’re still an interesting and relevant look at how each system handles update workloads under its native execution model.
1 comments

tl;dr Doing a bunch of synchronous writes and full table scans with extremely slow, high latency storage (for a dataset that easily fits in RAM) is slow.
My tl;dr is: Don't do huge updates using WHERE on fields _not_ in index, otherwise it will take days to finish and you get meaningless numbers in benchmarks...
I learned that in a hard way. It was midnight, I had to make some urgent changes to the user table in the production database. We had about 4 million users and needed to make changes to 70k users (due a change in 3rd party).

I figured out after 40 minutes that we are not making any progress. For each user we are probably searching 4 million users. After I added the index to a specific field, it was done quickly.

Oddly enough, for a big enough update, postgres sometimes will do the math and decide that scanning the whole table is faster than scanning the index and then scanning a certain fraction of the table (considering also that it has to rewrite the entire page if any row in the page is updated)