Hacker News new | ask | show | jobs
by rarrrrrr 6121 days ago
In SQL systems, you can easily shift the computation cost from at read time to at write time. That's what triggers are for. When new data comes in that changes a result you know you need quickly, a trigger can automatically add to a work queue to update a result table.

Since the trigger has automatic access to the contents of the new data (and old data in case of an update or delete) the computation to update the results table can often be made much faster.

Every situation is different of course, but it's overreaching to say that SQL systems have no options beyond read-time computation of results.

1 comments

This was my first thought too. Why not just denormalize the data in MySQL? What benefit is Cassandra really offering here over that option?
Cassandra gives you two benefits.

First, Cassandra uses a disk layout similar to the one described in the Bigtable paper (http://labs.google.com/papers/bigtable.html sections 5.3 and 5.4); in particular it does no random writes. Relational databases like mysql pretty much all use btree-based storage which was great 20 years ago but is terrible today when seeks are your bottleneck.

I was talking to some people today who are struggling to get mysql to do ~100 insert/update operations per second. Cassandra will easily give you 10x that -- _per node_.

The second benefit is that Cassandra gives you real, scalable partitioning, invisible to the app, for when you do need to add nodes. When you have more than a handful of machines, not having to babysit replication + partitioning is a huge, huge deal.

"I was talking to some people today who are struggling to get mysql to do ~100 insert/update operations per second."

I apologize for being flippant, but.. Were they running MySQL on an EeePC ?

They are probably Doing Something Wrong if they are struggling to do >100 insert/updates a second on even modest hardware.

Because of how the storage engine works, mysql insert performance degrades significantly as table size grows into double digits of GB, and especially once the PK index no longer fits in RAM.

Cassandra doesn't have to read indexes to figure out where to put the data so write performance stays constant as your data set grows.

A 7,200 RPM disk can do about 120 IO operations/s, max. Each transaction requires at least one IO operation to be properly committed to disk.

So ~100 inserts/updates per second is pretty close to the limit of modest hardware, unless you are either 1) doing more than one insert/update per transaction or 2) leaving write caching enabled on you disk.

#1 is the right thing to do when it works for your app, but it doesn't always work. #2 is the wrong thing to do if you care about the integrity of your data (but its probably not a bad thing to do in development if you understand that you'll probably need a battery-backed caching disk controller to get that kind of performance in production).

Cassandra avoids some of these issues by allowing the developer to specify the desired level of durability. A transaction may have to arrive on disk to be considered complete, or less stringently, it can arrive in memory on one or more additional machines in the cluster.

I'd also guess that cassandra will buffer and batch writes. Postgres can also do this with the write ahead log (WAL). I don't know if mySQL does the same. This trades some (configurable) latency for throughput. A transaction has to wait for the buffer to fill or a timer to expire before it is completed which increases latency, but multiple transactions end up bundled into a single IO operation, which increases throughput.

It's interesting what data structures would be optimal for the SSDs from the near future. If random writes are only a controller problem, it may be the good old btree again.