| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ncw33 3536 days ago

I agree, Cassandra is doing here exactly what I (as a user) would expect.

When using a DB like C*, you always have to ask yourself, "does this update happen before or after this one - have I done anything to ensure that's the case?"

In this example, the second query (the UPDATE) is being partially applied before the first query (the INSERT) - and that's OK, because there's no ordering or dependency in the second query that forces it to run second. So the reordering of the queries that he's observing is legal, and can be simply avoided with "UPDATE ... IF revision = :last-rev".

2 comments

garyrichardson 3536 days ago

My biggest complaint about C: SQL-like interface. These seems like a terrible decision to me.
If you are used to relational databases and SQL, it's a real struggle to get your head in the right place. Some of the gotchas:

Cell level vs row level consistency/locking (as stated in this article) * Grouping / counting isn't really a thing. * WHERE statements are actually used for identifying the key/value pair you want and not for filtering the rows. * Indexes aren't really indexes. They are definitions of partitions and sort order.

link

sargun 3535 days ago

That's actually my favourite part of C*. I think the CQL interface makes programming against it extremely easy, and I dare might say enjoyable.

It makes getting standard, and understanding what's going on a breeze.

link

isoap 3536 days ago

> the second query (the UPDATE) is being partially applied before the first query (the INSERT) - and that's OK

"Partially applied" is ok with a database?

The description of Cassandra on it's site is "Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data."

If it's mission-critical data, I wouldn't do arbitrary things with it for conflict resolution that can corrupt data.

link

pkolaczk 3536 days ago

Cassandra is perfectly fine. If you want your writes to be consistent, learn to use LWTs properly. The same way, if you want your data to be fully consistent in RDBMS, learn to use SERIALIZABLE isolation level (which is not default in most RDBMSes for performance reasons). If, in an RDBMS, you use SERIALIZABLE for half of the updates and READ UNCOMMITTED for another half, guess what consistency guarantees do you really get?

link

brianwawok 3536 days ago

This is not arbitrary. It may be not intuitive coming from a rdbms background, but it isn't arbitrary.

As Johnathon pointed out, it's like properly using synchronized or volatile half the time. I don't call it sometimes not working as arbitrary. I call it expected for not following the rules of the system.

link

tremon 3536 days ago

If I follow your argument correctly, it is basically the same argument as "your C compiler is correct, what you've written is invalid and the standard allows undefined behaviour here".

Which may be a technically valid argument against the compiler/database system, but it's not a valid argument for defending the system as a whole: if a standard allows arbitrary execution instead of bailing out on non-standard (ambiguous) input, it is unreliable.

link

hamax 3536 days ago

Is variable assignment in c/java/... unreliable? It behaves very similar to what C* does. Concurrent access and modification will produce undefined behaviour if you don't explicitly protect it.

link

brianwawok 3536 days ago

Exactly.

Getting access to things like concurrent locks is HARD to get right. That is why there are so many simple languages that don't let you touch concurrency.

Doesn't mean there is no need for it in the world, and no one should be able to use it.

link

pkolaczk 3536 days ago

RDBMSes can cause similar inconsistencies if you don't know what you're doing. It is like setting read uncommitted and then complaining about dirty reads.

link

isoap 3536 days ago

Let's pretend I'm leading a blind child by telling them which direction they should go, and if they don't step carefully, they could be hurt.

If I can't see the child, should I continue to give them direction, or tell them to stop?

In this use case, the database makes changes to data without knowing what is correct and what is harmful. That is not the user's fault. It's a code choice.

link

hamax 3536 days ago

No, it's a users choice to use a DB that has this tradeoff.

Like it was said a couple of time already, all of your complaints about C* would work just as well for locking mechanisms in most popular languages.

link