Hacker News new | ask | show | jobs
by spotman 4020 days ago
Let me explain better. As eventually consistent is probably the wrong term for what I mean.

In things like a large social media app where the reality is no FKs, and 100k queries-per-second, for instance, you would write into your application logic how to deal with child rows and parent rows, and not rely on FKs to raise an exception, or perform cascading deletes and things.

So if ( when ) there is either a bug, hiccup, or a variety of other reasons that can as you mention make things question correctness (or specifically: referential integrity, if we are going to nitpick), these are usually cleaned up out of band.

So you may have a table you have to slowly iterate through later and remove rows in which their parent rows no longer exist anymore, as an example.

In something like the friend feed schema used in the previous example, read up on how that works, as you will see they sort of turn SQL on its head a bit, to make it more flexible and deal with these shortcomings.

1 comments

On the other hand, if you deal with money, or physical products (like a car) or anything slightly more "important" than your "likes" count on a social site, eventual consistency would get you laughed out of the room pretty quickly.
In fact a lot of financial stuff basically works on eventual consistency, they just don't call it that, wear suits and carry serious expressions. What is double-entry bookkeeping? What is an audit? What is end-of-day? What is end-of-financial-year? What is invoice due date? What is multi-currency accounting? What is a market order on an exchange before you learn what it was sold for? What is a credit card transaction that can be disputed? What is a tax return? ... etc.
Agreed overall, see my comment above which addresses this. These types of systems typically will never see as many transactions/sec as their “likes” counterparts, for obvious reasons.

Having said that, I have worked on some credit card infrastructure that does get up there, and it too, does not have FKs. Having or not having FKs does not magically mean your data will be accurate or in-accurate. ACID, as well as traditional database transactions, still apply, and keep your data integrity high. In conjunction with well designed application logic, and proper planning, FKs are not needed to maintain accurate data whatsoever.

Also, there is commerce related things (Amazon) that work on truly eventually consistent data stores, like DynamoDB. Don’t see amazon getting laughed at much. (they use it for their shopping cart, source: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.h... , this paper is also a great read fwiw)

Anyways, I think we are getting a bit into the weeds here, as the OPs link is about a mysql bug. Cheers.

Well, the shopping carts do not contain data which must be correct so your example does not contradict Pamar.
I agree with you in spirit, but it's simply inaccurate to suggest that a shopping cart does not deal with mission critical data.

the fact is, Amazon does use Dynamo for a mission critical financial purpose. If Dynamo lost data Amazon would not use it for arguably one of the key pieces of their success.

the article I linked covers how they do this, and how it stays accurate, scalable, and robust.

anyways cheers, I hope we have not hijacked this thread about MySQL bugs too much.

I am afraid we are comparing apples to oranges.

First of all, any single shopping cart is typically accessed by one single process at a time: the user's browser. I can of course open two or more browser sessions, but this is not a typical use case, it definitely has an upper limit (how many browser tabs can I use concurrently?) and most importantly, lost or duplicated transactions are easily spotted and corrected (you do get plenty of chances to revise your order before paying).

A bank account can be "hit" by different sources at the same time (just think of a company bank account, not just a personal one) and most importantly, duplicated or lost transactions are much harder to prove and rectify.

The browser tabs have an upper limit sure, because on the client/browser side, it just might be unfeasible to open more than 100 tabs realistically.

But, I can assure you, you could open 1000 tabs, and dynamo will be fine, and not lose your data.

You should really read the article I linked, especially the part about vector clocks, it discusses a common approach for dealing with duplicate data, conflicts, and resolution, at scale.