Hacker News new | ask | show | jobs
by daxelrod 3725 days ago
It's long bothered me that SQL doesn't have a way of saying "join these two tables based on the key relationships I've already defined in my schema". (Or maybe some variants do?)

The database already knows that guest.reservation_id is a foreign key referencing reservation.id. Why should you have to repeat yourself?

4 comments

I have longed for that, too, but I'm not sure I should want it.

Problem is that it becomes problematic as soon as you keep such queries around (in stored procedures, source code, or batch files), and then modify your schema by adding or removing a foreign key.

What if you have more then one foreign key referencing the same table? Or worse, what if you had one and later add another? You would get undefined behavior.

Added:

Also SQL language itself has nothing to do with keys, relationships and other details of database design. It's different level of abstraction.

Various query builders/SQL generators do what you want.

And when there are multiple foreign keys between two tables?
Genuinely curious, what's the use-case for this?
Some examples:

In project management, tasks might have an assigner_id and assignee_id

In transaction management, transactions might have a sender_id and receiver_id

In sports, matches might have a team1_id and team2_id

> In transaction management, transactions might have a sender_id and receiver_id

in normal form, shouldn't you have "transactions", "transaction_senders", and "transaction_receivers" anyway? similarly for the other cases?

i know sometimes the simplicity of a single table is a valid choice, so your point still stands, but it does feel like the examples are cases of "well, if you're were doing the right way(TM), that wouldn't happen"

shouldn't you have "transactions", "transaction_senders", and "transaction_receivers"

were doing the right way(TM), that wouldn't happen

Having a separate transaction_senders and transaction_receivers wouldn't be the right way(TM) unless it's possible to have multiple senders/receivers. If a transaction can only have 1 sender/receiver then normalization is complete when the sender/receiver data is removed from 'transactions' table and replaced with the sender_id and receiver_id columns.

When introducing transaction_senders/transaction_receivers tables without further constraints, it would be immediately possible to have multiple transaction senders and multiple transaction receivers for a single transaction, which is likely an error.

These examples depend on the use case. If you are guaranteed to only have one sender and one receiver, then having them in the same table is already normal form.

Perhaps (?) a less controversial example is a table of flights between airports. Should a single flight have departure_ids and destination_ids? (assuming we treat legs as separate flights)

Or what about package delivery? Do we need sender_ids and receiver_ids there?

created_by and updated_by both pointing at the user table?
Explicit foreign keys perform better in some joins, as the optimiser can assume that all possible values are in the source column of that foreign key.
> join these two tables based on the key relationships I've already defined in my schema

This is almost exactly what "NATURAL JOIN" does. It is like "JOIN USING" with a pre-filled column list.

However, this isn't decided based on the schema, but based on equal column names. So you need to apply the dicipline as noted by the parent comment.

See also the sibling discussion thread at: https://news.ycombinator.com/item?id=11450187