Hacker News new | ask | show | jobs
by saltcured 933 days ago
Coming from a PostgreSQL worldview, I find this confusing. To me a foreign key constraint is about the referential integrity of a table, not an insert-time rule that can have loopholes to leave invalid data in the table. If the constraint is in place, I should be able to trust that any queryable data satisfies the constraint.

Also from my PostgreSQL-infused worldview, it seems to me you are making your life too difficult by requiring a migration to make "one change" to a table or view. The brute force idiom I've seen for schema migrations is to break it into phases:

1. drop departing foreign key constraints

2. restructure table columns/types and values

3. add new foreign key constraints

This is a bit like running with constraints deferred while making data changes that might look invalid until all data changes are done. But, it defers expression of the new constraints until the table structures are in place to support their definitions too, so it isn't just about deferring enforcement.

The same strategy can be used for import scenarios to support schemas where there are circular foreign key reference constraints. I.e. tables are not in a strict parent-child hierarchy.

1 comments

Valid points! In my experience, when someone has foreign key constraints in their database, they tend to develop their apps in "trusting" way. Meaning, the app trusts the DB to maintain referential integrity. When you do the three step breakdown, you remove that integrity, and the app doesn't know any better: it keeps feeding the database with data, the database says "fine", and the app assumes referential integrity is preserved.

This is why in our design PlanetScale will not take upon itself to do this three step change. The user is more than welcome to break this into three different (likely they'll be able to make it in just two) schema changes. But then the user takes ownership of handling unchecked references.

> making data changes that might look invalid until all data changes are done

In effect, the data _will be_ invalid, and potentially for many hours.

Now, it's true that if the user messed up the data in between, then adding the foreign key constraint will fail, in both PostgreSQL and in MySQL. To me, this signals more bad news, because now the user has to scramble to clean up whatever incorrect data they have, before they're able to complete their schema change and unblock anyone else who might be interested in modifying the table.

Personally, my take is to not use foreign key constraints on large scale databases. It's nice to have, but comes at a great cost. IMHO referential data integrity should be handled, gracefully, by the app. Moreover, referential integrity is but one aspect of data integrity/consistency. There are many other forms of data integrity, which are commonly managed by the app, due to specific business logic. I think the app should own the data as much as it can. My 2c.

Ah, in my worldview those steps are still in a single transaction. It's just formulated as several ordered statements.

Is that mechanism unique to PostgreSQL? Or is it just that this transaction is impractical for you, due to the wall clock duration and how it impacts other use of the DB?