| > Junction tables are how you represent M:N relationships. Yeah, the problem is that when you get to 4NF+, you're often looking at creating a new table joining through a junction table for a single multi-valued data field that may be single values a plurality or majority of the time. So you need the base table, the junction table that has at least two columns, and the actual data table. So, you've added two tables, two foreign key constraints, two primary key indexes, potentially more non-clustered indexes... and any query means you need two joins. And data validation is hard because you need to use an anti-join to find missing data. Or, you can go with an 1:N relationship. Now you have only one more table at the cost of potentially duplicating values between entities. But if we're talking about, say, telephone numbers? Sure, different entities might share the same phone number. Do you need a junction table so you don't duplicate a phone number? You're certainly not saving disk space or improving performance by doing that unless there's regularly dozens of individual records associated to a single phone number. And if the field is 1:1... or even 90% or 95% 1:1... do you really need a separate table just so you don't store a NULL in a column? You're not going to be eliminating nulls from your queries. They'll be full of LEFT JOINs everywhere; three-valued logic isn't going anywhere. > Databases must be correct above all else; if they're fast but wrong, they're useless. Yeah, and if they're "correct" but you can't get it to return data in a timely manner, they're also useless. A database that's a black hole is not an improvement. If it takes 20 joins just to return basic information, you're going to run into performance problems as well as usability problems. If 18 of those joins are to describe fidelity that you don't even need? |
I've never had to do 18 joins to extract information in my career. I'm sure these cases do legitimately exist but they are of course rare, even in large enterprises. Most companies are more than capable of distinguishing OLTP from OLAP and real-time from batch and design (or redesign) accordingly.
Databases and their designs shift with the use case.