Hacker News new | ask | show | jobs
by easp 6113 days ago
A 7,200 RPM disk can do about 120 IO operations/s, max. Each transaction requires at least one IO operation to be properly committed to disk.

So ~100 inserts/updates per second is pretty close to the limit of modest hardware, unless you are either 1) doing more than one insert/update per transaction or 2) leaving write caching enabled on you disk.

#1 is the right thing to do when it works for your app, but it doesn't always work. #2 is the wrong thing to do if you care about the integrity of your data (but its probably not a bad thing to do in development if you understand that you'll probably need a battery-backed caching disk controller to get that kind of performance in production).

Cassandra avoids some of these issues by allowing the developer to specify the desired level of durability. A transaction may have to arrive on disk to be considered complete, or less stringently, it can arrive in memory on one or more additional machines in the cluster.

I'd also guess that cassandra will buffer and batch writes. Postgres can also do this with the write ahead log (WAL). I don't know if mySQL does the same. This trades some (configurable) latency for throughput. A transaction has to wait for the buffer to fill or a timer to expire before it is completed which increases latency, but multiple transactions end up bundled into a single IO operation, which increases throughput.