Hacker News new | ask | show | jobs
by aaronblohowiak 6121 days ago
"I was talking to some people today who are struggling to get mysql to do ~100 insert/update operations per second."

I apologize for being flippant, but.. Were they running MySQL on an EeePC ?

They are probably Doing Something Wrong if they are struggling to do >100 insert/updates a second on even modest hardware.

2 comments

Because of how the storage engine works, mysql insert performance degrades significantly as table size grows into double digits of GB, and especially once the PK index no longer fits in RAM.

Cassandra doesn't have to read indexes to figure out where to put the data so write performance stays constant as your data set grows.

A 7,200 RPM disk can do about 120 IO operations/s, max. Each transaction requires at least one IO operation to be properly committed to disk.

So ~100 inserts/updates per second is pretty close to the limit of modest hardware, unless you are either 1) doing more than one insert/update per transaction or 2) leaving write caching enabled on you disk.

#1 is the right thing to do when it works for your app, but it doesn't always work. #2 is the wrong thing to do if you care about the integrity of your data (but its probably not a bad thing to do in development if you understand that you'll probably need a battery-backed caching disk controller to get that kind of performance in production).

Cassandra avoids some of these issues by allowing the developer to specify the desired level of durability. A transaction may have to arrive on disk to be considered complete, or less stringently, it can arrive in memory on one or more additional machines in the cluster.

I'd also guess that cassandra will buffer and batch writes. Postgres can also do this with the write ahead log (WAL). I don't know if mySQL does the same. This trades some (configurable) latency for throughput. A transaction has to wait for the buffer to fill or a timer to expire before it is completed which increases latency, but multiple transactions end up bundled into a single IO operation, which increases throughput.