|
Not sorry, I’ll stick with SQL non-expert as I’ve only worked with databases for a few decades and sometimes run into people who know more. Working with a database you built or can control is kind of a simplistic example. In my experience this most often arises when it’s someone else’s database or API you’re interacting with and do not control. An upstream ERP system doesn’t have unique keys, or it changes the name of a column, or an accounting person adds a custom field, or the accounting system does claim to have unique identifiers but gets upgraded and changes the unique row identifiers that were never supposed to change, or a user deletes a record and recreates the same record with the same name, which now has a different ID so the data in your reporting database no longer has the correct foreign keys, and some of the data has to be cross-referenced from the ERP system with a CRM that only has a flaky API and the only way to get the data is by pulling a CSV report file from an email, which doesn’t have the same field names to reliably correlate the data with the ERP, and worse the CRM makes these user-editable so one of your 200 sales people decides to use their own naming scheme or makes a typo and we have 10 different ways of spelling “New York”, “new york”, “NY”, “newyork2”, “now york”, and yeah… Turns out you can sometimes end up with extra rows despite your best efforts and that SQL isn’t always the best tool for joining data, and no I’m not interested in helping you troubleshoot your 7-page SQL query that’s stacked on top of multiple layers of nested SQL views that’s giving you too many rows. You might even say I’m averse. |
This is a problem of form, not method. JOINs are a fantastic, well-defined method to aggregate data. If the form of your data is messed up, then naturally the result may be too.
> no I’m not interested in helping you troubleshoot your 7-page SQL query that’s stacked on top of multiple layers of nested SQL views that’s giving you too many rows
People say this type of thing but SQL is an incredibly high-level language.
Yes debugging a big SQL query can suck. Debugging the equivalent of it is almost always much worse. I'd rather debug a 30-line query than a 400 line perl script that does the same thing. Because that's actually the alternative.
I have manually aggregated data, messy data, in awk, perl, python... it is much, much worse.