Hacker News new | ask | show | jobs
by ethangunderson 5514 days ago
When we talk about consistency, we're talking about taking the database from one consistant state to another.

With replica sets, we're still only dealing with one master. We can get inconsistant reads from the replicas, but we're always writing to a single master, which allows that master to determine the integrity of a write.

With sharding, we're still only dealing with one canonical home for a specific key(defined by the shard key). (besides latency, I'm not sure how datacenters would affect this)

What we're giving up in this case is availability. If an entire replica set goes down, we can't read or write any data for the key ranges contained on those machines. This is where Riak shines.

With Riak, any node can accept writes, and nodes contain copys of several other nodes data. What that means is, as long as we have one node up, we can write to the database. Because of this, there is the possibility of nodes having different views of the data. This is handled in a number of ways(read repairs, vector clocks, etc). Check out the Amazon Dynamo paper for more info, great read.

I'm sure I'm missing some stuff, but I think that covers the gist of it.

EDIT: One thing that I want to make clear, I don't think that one architecture is better than the other. They each have their own pros and cons, and are really suited to solve different problems.

1 comments

None of this is guaranteed by default. By default, writes are flushed every 60 seconds. By default, there's no journaling. How can one claim full consistency if the the former two points are true?

Don't get me wrong, I love mongo. I'm building a web app backed by it. But the marketing talk is grating, which whT this post nails.

I think those two issues are orthogonal to consistency. In ACID, consistency and durability are two different letters and CAP doesn't even mention durability. Are you referring to another definition of consistency?
How is flushing a write every 60 seconds orthogonal to consistency? If there's a server crash between the write to RAM and the subsequent flush, the data is lost, is it not? How do you guarantee the data is there in that case?
That would mean the data set was not durable, it doesn't speak to consistency at all. DB consistency is about transaction ordering. Transaction 1 always comes before transaction 2, but 2 may exist or not as it pleases. Transaction 1 must be present if 2 is present.