Hacker News new | ask | show | jobs
by consteval 662 days ago
Yes when you have duplicated data and data inconsistencies/integrity issues you might get duplicate data and data inconsistencies/integrity issues in your output.

This is a problem of form, not method. JOINs are a fantastic, well-defined method to aggregate data. If the form of your data is messed up, then naturally the result may be too.

> no I’m not interested in helping you troubleshoot your 7-page SQL query that’s stacked on top of multiple layers of nested SQL views that’s giving you too many rows

People say this type of thing but SQL is an incredibly high-level language.

Yes debugging a big SQL query can suck. Debugging the equivalent of it is almost always much worse. I'd rather debug a 30-line query than a 400 line perl script that does the same thing. Because that's actually the alternative.

I have manually aggregated data, messy data, in awk, perl, python... it is much, much worse.

1 comments

> ”I have manually aggregated data, messy data, in awk, perl, python... it is much, much worse.”

Yes, but with Python/etc you can at least do the same logic in a plain loop, which is much slower but serves as a more reliable reference of what the data is supposed to be, which can be used to validate the functionality of SQL output.

Is there an equivalent in SQL of this “slow and dumb” approach for validating? Like, I’m not sure if a lateral join is essentially doing the same thing under the hood.

Most databases have the concept of temporary tables that will automatically disappear when your session ends. For troubleshooting, I would breakdown each step and save it in a temp table. Validate it. Then use it as the input for the next step. Etc.