Hacker News new | ask | show | jobs
by whatever1 811 days ago
The trial and error is the worst part.

In traditional languages, you can print iteration by iteration the intermediate result and understand if there is something wrong.

In SQL you sample output, and you keep changing the query until you think you get it right. And then 2 years later someone else finds that the query was wrong all this time.

5 comments

Common Table Expressions (CTE) do help a little, as you can query each “table” and inspect the output. Debugging a giant query with deeply nested sub queries is very painful indeed
So do table variables and temp tables.
> The trial and error is the worst part.

I don't know about anyone else, but I do this kinda naturally when writing SQL queries. Usually start with a base table, query the first 100 rows to see what the data looks like, start joining on other tables to get info I need, querying as I go to check join conditions, perhaps build out some CTEs if I need to do some more complex work, query those to check the format of the data ... And so on.

It doesn't feel that different to any other programming in that sense. Querying is printing.

> you can print iteration by iteration the intermediate result

You would not be able to do that with a multi-threaded/multi-process application.

And this is the reason why e.g. Trino/Presto is so powerful together with SQL.

Instead of telling the computer how to go by to get your result, you tell it what result you want and let it do it in the best way.

The most up-front way of telling a computer "how" is a for-loop. And SQL does not have it. It may seem limiting, but avoiding explicit for loops gives the freedom to the computer. If it sees it fit to distribute that calculation over 200 distributed CPUs it can do that. With an imperative language you need to tell the computer exactly how it should distribute it. And from there it gets really hairy.

In development I don't need it to be multi-threaded. 1 thread is fine, as long as I can explain, step-by-step, how the calculations produced the output.
If you don't need threads in development OR production, you might as well do SELECT * from users and do the join in your imperative code.

If you need threads in production I think you will end up getting rid of your for loops anyway (or possibly, if you really want to, end up in a mutex/semaphore quagmire).

I must say, though, that there are other benefits with a declarative approach than just avoiding threading issues. But I guess it takes some getting used to.

I would say that the same "I cant step through my code" argument also goes for functional style code.

> If you don't need threads in development OR production, you might as well do SELECT * from users and do the join in your imperative code.

Except that it most likely will be orders of magnitude slower. Most databases are very good at what they are doing.

Yes. Kind of my point to. But the OP missed the possibility to step through the code.
sure you can. set the concurrency limit to 1. If you're debugging the logic and not some race condition then this works perfectly fine. Remember to profile afterwards though
Trial and error is usually a bad idea in all kinds of programming.
I mean, I never build a query from front to back. Usually I build it FROM -> JOIN -> WHERE -> SELECT.
Start off with SELECT * then once the joins are working, filter * down to the essentials.