Hacker News new | ask | show | jobs
by ak39 4012 days ago
Foreign keys are not convenience artifacts. They are essential for enforcing data correctness. One doesn't simply opt out of their usage simply because one is "not a fan" of foreign keys.
2 comments

I've experienced several highly-performant, correct databases with concurrent clients from around the globe ... and they had no enforcement of foreign keys.

Yes, foreign keys were essential to data correctness. No, they were not being enforced in production. Stew on that just a bit. Not enforced in production.

During client application development, development databases enforced the constraint. Any errors resulting in foreign key mismatches disqualified the client app for release. The applications were required to be aware of the constraint and to learn to work within it. Production gained the speed benefits of not having to enforce the foreign keys.

You can indeed eat your cake and have it, too, iff you are disciplined.

This might be the case if you have one and only one application interacting with the database, and your unit tests include each and every permutation/edge case.

It is very common to have many applications, from different vendors and from different time periods interacting with a database.

It often also happens that data can be modified directly in the database, not via the application. I agree this is not best practice but in larger enterprises it happens.

Once you start implementing foreign keys in the client you are adding a lot of complexity to a problem that most mature relational database system do well enough to be trusted by a lot of traditional financial institutions.

These kinds of access are not proper discipline. And a lack of proper discipline is the reason to enforce constraints in production. As for "adding complexity to client apps" - how could you possibly create an app to interact with a database and not know about the structure of the database? This is beyond irresponsible. At the least, should an institution desire to allow third-party access to their database, then that access should be provided through a gateway with the intelligence to enforce constraints without relying on the database server.
> It often also happens that data can be modified directly in the database, not via the application.

Yup, 3rd level support "winging it" in production.

And also migration scripts.

Great tale! Much respect to the development and operations teams here. Out of interest, what kind of purpose was the database?
It was in a financial institution. It held every transaction in the financial enterprise.
Nicely put. 100% agreed.
What if it is the difference between your schema working under high transaction rates or not? Or, the difference between buying much more database hardware and having smaller shards to avoid index contention. (which, is $$)

Facebooks mysql architecture (at least used to, likely still is) based on this: https://backchannel.org/blog/friendfeed-schemaless-mysql

Note the simple schema, and lack of FKs.

FKs are rarely present in extremely high transactions per second systems that operate on the same few tables. This is due to index contention and locking.

I say this with lots of experience working on different production systems that see hundreds of thousands of transactions per second. I have yet to see one arrive at these kind of numbers using foreign keys, unless it's something like a giant shared hosting platform, that is operating on hundreds and hundreds of different tables. (therefore, less index contention)

Often, in these shops, data correctness is not validated in realtime in this way, but often in a way that is not in the critical path of answering queries. (more like, eventually consistent). Some places have entire teams for this.

To your point on correctness, my experience ranges a lot in areas where correctness can be eventually consistent. For something like healthcare or banking, you would rather spend the money on way more hardware, because you can't afford correctness to be off. However, with things like globally scaled social apps, this is just not the case.

Most people tend to still use Oracle in those situations. Which is legions slower , and legions more expensive than mysql or pgsql.

>To your point on correctness, my experience ranges a > lot in areas where correctness can be eventually consistent.

"Eventually consistent"? Explain. (Do you mean in an ACID transaction?)

Let me explain better. As eventually consistent is probably the wrong term for what I mean.

In things like a large social media app where the reality is no FKs, and 100k queries-per-second, for instance, you would write into your application logic how to deal with child rows and parent rows, and not rely on FKs to raise an exception, or perform cascading deletes and things.

So if ( when ) there is either a bug, hiccup, or a variety of other reasons that can as you mention make things question correctness (or specifically: referential integrity, if we are going to nitpick), these are usually cleaned up out of band.

So you may have a table you have to slowly iterate through later and remove rows in which their parent rows no longer exist anymore, as an example.

In something like the friend feed schema used in the previous example, read up on how that works, as you will see they sort of turn SQL on its head a bit, to make it more flexible and deal with these shortcomings.

On the other hand, if you deal with money, or physical products (like a car) or anything slightly more "important" than your "likes" count on a social site, eventual consistency would get you laughed out of the room pretty quickly.
In fact a lot of financial stuff basically works on eventual consistency, they just don't call it that, wear suits and carry serious expressions. What is double-entry bookkeeping? What is an audit? What is end-of-day? What is end-of-financial-year? What is invoice due date? What is multi-currency accounting? What is a market order on an exchange before you learn what it was sold for? What is a credit card transaction that can be disputed? What is a tax return? ... etc.
Agreed overall, see my comment above which addresses this. These types of systems typically will never see as many transactions/sec as their “likes” counterparts, for obvious reasons.

Having said that, I have worked on some credit card infrastructure that does get up there, and it too, does not have FKs. Having or not having FKs does not magically mean your data will be accurate or in-accurate. ACID, as well as traditional database transactions, still apply, and keep your data integrity high. In conjunction with well designed application logic, and proper planning, FKs are not needed to maintain accurate data whatsoever.

Also, there is commerce related things (Amazon) that work on truly eventually consistent data stores, like DynamoDB. Don’t see amazon getting laughed at much. (they use it for their shopping cart, source: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.h... , this paper is also a great read fwiw)

Anyways, I think we are getting a bit into the weeds here, as the OPs link is about a mysql bug. Cheers.

Well, the shopping carts do not contain data which must be correct so your example does not contradict Pamar.