Hacker News new | ask | show | jobs
by jordanlewis 1388 days ago
Actually, the designers of TPC-C were ahead of their time - they deliberately designed TPC-C to not be trivially shardable. The new order transaction, which is the heart of the workload, is required to cause a read outside of a shard 1% of the time [0] for each of the 10 on average items in a new order. So, 10% of the time on average, the workload should emit a cross-shard transaction.

A correctly implemented TPC-C workload does actually need to issue cross-shard transactions. There are a lot of "TPC-C-like" workloads that don't actually implement this behavior, though, along with other harder to handle attributes like foreign keys and sleep/wait time.

TPC-C is really an enduring, interesting benchmark.

[0]: See section 2.4.1.5 2 of the spec https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c...

1 comments

That's true, I should have been a bit more specific.

Each individual query runs within a single shard as far I recall (its been a few years...). There maybe transactions (like new order) that do multiple queries such that the transaction does read and writes to more then one shard. This doesn't really change how easy the workload is to scale out too much though. Harder to scale workloads require a single query to run across many (often all) shards or require data reshuffling for joins (there is not one good shard key that keeps all joins shard local). In these cases adding more nodes or shards will make each query a bit slower.