Hacker News new | ask | show | jobs
by estetlinus 65 days ago
The lost art of normalizing databases. ”Why is the ARR so high on client X? Oh, we’re counting it 11 times lol”.

I would maybe throw in date as an key too. Bad idea?

2 comments

It depends on if you are doing OLTP (granular, transactional) vs OLAP (fact/date based aggregates) - dates are generally not something you'd consider in a fully normalized flow to uniqify records.
Makes sense. I’m an OLAP guy.
Well you are in luck, if someone tells your to normalize you can tell them their source system can fuck right off.
Frankly I don't think that overcounting is solved by normalizing, because it's easy to write an overcounting SQL query over perfectly normalized data.

I tried to explain the real cause of overcounting in my "Modern Guide to SQL JOINs":

https://kb.databasedesignbook.com/posts/sql-joins/#understan...

I'll go one further and say that if you're reaching for DISTINCT and you have joins, you may have joined the data the wrong way. It's not a RULE, but it's ALWAYS a 'smell' when I see a query that uses DISTINCT to shove away duplicate matches. I always add a comment for the exceptions.
Great read, thank you!