Hacker News new | ask | show | jobs
by vvern 565 days ago
Can people please, for the love of god, stop running tpcc with think time disabled. When run in this way it is not the TPC-C benchmark, and is not "simulating real database workloads that is considered a modern standard in database applications." TPC-C generally has an open-loop traffic arrival rate that scales with the size of the data and is lightly contended. When run without think time, it becomes closed loop, and generally dominated by the contention that was not supposed to be dominant.

This instance is less bad than some in that it's at least comparing the same sort of database and doing it using the same driver -- so it is at least an apples to apples measurement of something.

Still, please, as a community we need to stop getting rid of the think time and quoting the output as tpmC or as a standard benchmark.

See https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c... for the spec.

2 comments

Thank you for your feedback! We tried to enable think time with go-tpc, thanks to @pashkinelfe. That leaves us with 1 tpmC per connection, growing linearly up to ~300 connections for both heap and OrioleDB. So, in order to experience a storage bottleneck, we would need dozens of thousands of connections. Given PostgreSQL runs a process per connection, that would be more of a stress test for the Linux kernel. Additionally, PostgreSQL requires N^2 of memory depending on the number of connections, and it becomes sensible at this scale. All of that could be resolved by migrating PostgreSQL to co-routines and resolving memory requirement issues. However, this is currently out of scope for us. Could you recommend another benchmark to reveal storage bottlenecks, given that TPC-B and YCSB are too trivial?
> Additionally, PostgreSQL requires N^2 of memory depending on the number of connections,

For sure, not all the PostgreSQL memory is N^2. AFAIR, just a couple of components, including deadlock decoding, require a quadratic amount of memory. Normally, they are insignificant but growing fast if you are rising max_connections.

One approach to mitigate the connection problems for tpcc would be to utilize a connection pooler like pgbouncer or yandex/odyssey. It’s certainly more complexity.

Another suite to look at is sysbench. It’s very flexible, for better or for worse, but it can allow you to create an interesting mix of queries at different scale factors. For something like this where you’re going head to head with Postgres, having more dimensions with more benchmarks isn’t going to hurt. Ideally you’ll see a nice win across the board and get an understanding of the shape of differences.

I'll explore performance with this parameter. Thanks for the advice!