Hacker News new | ask | show | jobs
by _heimdall 777 days ago
I'm actually curious what the distinction is in your view. I've never really considered a column to be a foreign key when the constraint isn't used.

Having a column that we give business logic context to is useful, and indexing a column that should contain values for another table is helpful for query speed, but at least in my opinion they really aren't foreign keys unless that constraint lives directly in the database layer itself. I'd say the same for columns that are used as unique identifiers without actually adding unique constraints to the column.

There are good performance reasons to do either one if you're willing to take on the data integrity responsibility in the application code, but the column itself really is just a typed column if the constraints live elsewhere (again in my opinion, I think the technical definitions may ignore this functional argument).

Where I find foreign key constraints helpful for queries is when I need to be absolutely sure of the data integrity. Say I need to make a complex query that joins across three different tables based on foreign keys. If the table constraints exist I know that (a) any value in a foreign key column is valid and the referenced key exists and (b) if no rows are found I can trust that its just because none exist.

Without foreign key constraints, I may not know why the query didn't find any results. It could be because there just aren't any matching rows, it could also be that one of the keys is no longer valid (or never was). If I don't care about that second error state my query may not change much, but if I need to know why the query failed to match and handle any invalid data accordingly I couldn't do it.

When writing, I also much prefer having a single insert that I know will fail if the foreign key isn't valid. This could be done with a more complex query, or a transaction, but then I'd be taking on that responsibility when it could live directly in the db. Beyond the complexity there, I have to assume the database authors would be able to write a more efficient foreign key validation check them I could from my end.

That said, what's been your experience handling foreign keys when the constraint is either unsupported or unused? Do you avoid it mainly for the bump in query performance, and if so how do you avoid that performance hit elsewhere in your code?

1 comments

Excluding multi-column keys and joins, I'd describe a column as a foreign key in the context of a given query; i.e. when that query references a unique column in a JOIN condition on a related table. This differs from an explicit declaration of a foreign key constraint which provides all the useful referential integrity characteristics that you eluded to. If you said to me "Database Foo doesn't support foreign keys!" I'd take that to mean that you couldn't perform queries with joins.

In my experience, working on systems where foreign key constraints are liberally applied has been a net negative. Certain classes of DML statements (ON DELETE CASCADE I remember as being infamous) are certainly worse in performance than they otherwise might be. As an administrator I remember being repeatedly and painfully hamstrung by the inability to make arbitrary DB writes, which may momentarily violate strict data integrity, but are necessary for immediate practical reasons.

Obviously data integrity suffers without explicit constraints, but I'd rather work to backfill and/or clean up messy data than deal with a frustratingly rigid and poor performing system. I've worked on a number of large-scale MySQL database deployments at various tech companies, and I can't recall many, if any, that required or possessed pristine referential integrity. I can see why it's conceptually compelling, and I appreciate how automated tooling can generate very useful entity relationship diagrams if FK relationships are explicitly spelled out by constraints.

I think the "performance hit elsewhere" only happens given the assumption that strict referential integrity is a requirement, perhaps in a banking context or another where messy data simply cannot be tolerated.

There's no perfect way to do anything in software, especially when it comes to a database. There are always tradeoffs.

There are times when I'd skip constraints and trust the application logic to handle it, sounds like you've run into those as well.

Personally I just have a really high bar when it comes to moving data integrity out of the db and into the app code. Stale data isn't usually my concern, I'm more concerned with how simply and clearly I can define the data contract with anything consuming the data.

When I can guarantee that a column marked as a foreign key will always be a valid foreign key, consumers aren't at risk of a whole class of errors when reading and writing data. Any one query may be marginally slower because the constraint is in the database and always executed, but I know for sure that I'll never have a frontend blowing up because they didn't realize the foreign key isn't really a foreign key, or a backend accidentally writing bad data because the foreign key value wasn't manually checked for validity before being written.

I can say that the larger the team I've been on the less I've seen issues with data integrity loving outside the database, assuming the project is architected well. When an entire team is dedicated to the database and another team, or teams, are dedicated to just the application logic that manages the db, it tends to be much better documented and tested.

Smaller teams have a tendency to along code around much faster and write fewer tests while still finding product-market fit. In those cases, database constraints are an absolute must in my opinion, app logic is just moving too quickly to have any faith in it maintaining data integrity and teams are often growing quickly enough that stuff falls through the cracks.

Honestly as long as someone on the project is seriously considering the tradeoffs, the team is in a pretty damn good spot regardless of what their needs and preferences end up being though!

Indeed, design is the art of balancing tradeoffs I think. I appreciate your argument for constraints being a guard rail for developers who are moving fast and occasionally breaking things. I think nowadays you can enable and disable the checking of constraints in MySQL dynamically without having to restart the database, which would have greatly reduced my past frustrations.

Nice to chat!

I’ve worked at 3 large scale startups now that removed all their FK constraints due to performance and inflexibility issues. The lack of them never really caused an issue - I’ve never missed them. If you have a services / multiple db architecture you’re going to have to deal with dangling references anyway.

I think they’re somewhat obsolete as a concept.