Hacker News new | ask | show | jobs
by ncw33 3490 days ago
I agree, Cassandra is doing here exactly what I (as a user) would expect.

When using a DB like C*, you always have to ask yourself, "does this update happen before or after this one - have I done anything to ensure that's the case?"

In this example, the second query (the UPDATE) is being partially applied before the first query (the INSERT) - and that's OK, because there's no ordering or dependency in the second query that forces it to run second. So the reordering of the queries that he's observing is legal, and can be simply avoided with "UPDATE ... IF revision = :last-rev".

2 comments

My biggest complaint about C: SQL-like interface. These seems like a terrible decision to me.

If you are used to relational databases and SQL, it's a real struggle to get your head in the right place. Some of the gotchas:

Cell level vs row level consistency/locking (as stated in this article) * Grouping / counting isn't really a thing. * WHERE statements are actually used for identifying the key/value pair you want and not for filtering the rows. * Indexes aren't really indexes. They are definitions of partitions and sort order.

That's actually my favourite part of C*. I think the CQL interface makes programming against it extremely easy, and I dare might say enjoyable.

It makes getting standard, and understanding what's going on a breeze.

> the second query (the UPDATE) is being partially applied before the first query (the INSERT) - and that's OK

"Partially applied" is ok with a database?

The description of Cassandra on it's site is "Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data."

If it's mission-critical data, I wouldn't do arbitrary things with it for conflict resolution that can corrupt data.

Cassandra is perfectly fine. If you want your writes to be consistent, learn to use LWTs properly. The same way, if you want your data to be fully consistent in RDBMS, learn to use SERIALIZABLE isolation level (which is not default in most RDBMSes for performance reasons). If, in an RDBMS, you use SERIALIZABLE for half of the updates and READ UNCOMMITTED for another half, guess what consistency guarantees do you really get?
This is not arbitrary. It may be not intuitive coming from a rdbms background, but it isn't arbitrary.

As Johnathon pointed out, it's like properly using synchronized or volatile half the time. I don't call it sometimes not working as arbitrary. I call it expected for not following the rules of the system.

If I follow your argument correctly, it is basically the same argument as "your C compiler is correct, what you've written is invalid and the standard allows undefined behaviour here".

Which may be a technically valid argument against the compiler/database system, but it's not a valid argument for defending the system as a whole: if a standard allows arbitrary execution instead of bailing out on non-standard (ambiguous) input, it is unreliable.

Is variable assignment in c/java/... unreliable? It behaves very similar to what C* does. Concurrent access and modification will produce undefined behaviour if you don't explicitly protect it.
Exactly.

Getting access to things like concurrent locks is HARD to get right. That is why there are so many simple languages that don't let you touch concurrency.

Doesn't mean there is no need for it in the world, and no one should be able to use it.

RDBMSes can cause similar inconsistencies if you don't know what you're doing. It is like setting read uncommitted and then complaining about dirty reads.
Let's pretend I'm leading a blind child by telling them which direction they should go, and if they don't step carefully, they could be hurt.

If I can't see the child, should I continue to give them direction, or tell them to stop?

In this use case, the database makes changes to data without knowing what is correct and what is harmful. That is not the user's fault. It's a code choice.

No, it's a users choice to use a DB that has this tradeoff.

Like it was said a couple of time already, all of your complaints about C* would work just as well for locking mechanisms in most popular languages.