Hacker News new | ask | show | jobs
by spudlyo 777 days ago
Excluding multi-column keys and joins, I'd describe a column as a foreign key in the context of a given query; i.e. when that query references a unique column in a JOIN condition on a related table. This differs from an explicit declaration of a foreign key constraint which provides all the useful referential integrity characteristics that you eluded to. If you said to me "Database Foo doesn't support foreign keys!" I'd take that to mean that you couldn't perform queries with joins.

In my experience, working on systems where foreign key constraints are liberally applied has been a net negative. Certain classes of DML statements (ON DELETE CASCADE I remember as being infamous) are certainly worse in performance than they otherwise might be. As an administrator I remember being repeatedly and painfully hamstrung by the inability to make arbitrary DB writes, which may momentarily violate strict data integrity, but are necessary for immediate practical reasons.

Obviously data integrity suffers without explicit constraints, but I'd rather work to backfill and/or clean up messy data than deal with a frustratingly rigid and poor performing system. I've worked on a number of large-scale MySQL database deployments at various tech companies, and I can't recall many, if any, that required or possessed pristine referential integrity. I can see why it's conceptually compelling, and I appreciate how automated tooling can generate very useful entity relationship diagrams if FK relationships are explicitly spelled out by constraints.

I think the "performance hit elsewhere" only happens given the assumption that strict referential integrity is a requirement, perhaps in a banking context or another where messy data simply cannot be tolerated.

2 comments

There's no perfect way to do anything in software, especially when it comes to a database. There are always tradeoffs.

There are times when I'd skip constraints and trust the application logic to handle it, sounds like you've run into those as well.

Personally I just have a really high bar when it comes to moving data integrity out of the db and into the app code. Stale data isn't usually my concern, I'm more concerned with how simply and clearly I can define the data contract with anything consuming the data.

When I can guarantee that a column marked as a foreign key will always be a valid foreign key, consumers aren't at risk of a whole class of errors when reading and writing data. Any one query may be marginally slower because the constraint is in the database and always executed, but I know for sure that I'll never have a frontend blowing up because they didn't realize the foreign key isn't really a foreign key, or a backend accidentally writing bad data because the foreign key value wasn't manually checked for validity before being written.

I can say that the larger the team I've been on the less I've seen issues with data integrity loving outside the database, assuming the project is architected well. When an entire team is dedicated to the database and another team, or teams, are dedicated to just the application logic that manages the db, it tends to be much better documented and tested.

Smaller teams have a tendency to along code around much faster and write fewer tests while still finding product-market fit. In those cases, database constraints are an absolute must in my opinion, app logic is just moving too quickly to have any faith in it maintaining data integrity and teams are often growing quickly enough that stuff falls through the cracks.

Honestly as long as someone on the project is seriously considering the tradeoffs, the team is in a pretty damn good spot regardless of what their needs and preferences end up being though!

Indeed, design is the art of balancing tradeoffs I think. I appreciate your argument for constraints being a guard rail for developers who are moving fast and occasionally breaking things. I think nowadays you can enable and disable the checking of constraints in MySQL dynamically without having to restart the database, which would have greatly reduced my past frustrations.

Nice to chat!

I’ve worked at 3 large scale startups now that removed all their FK constraints due to performance and inflexibility issues. The lack of them never really caused an issue - I’ve never missed them. If you have a services / multiple db architecture you’re going to have to deal with dangling references anyway.

I think they’re somewhat obsolete as a concept.