Hacker News new | ask | show | jobs
by jihadjihad 60 days ago
> You did not write a single line of SQL. You did not set up an attribution model. You asked a question, in English, and got a table.

But nobody bothered to check if it was correct. It might seem correct, but I've been burned by queries exactly like these many, many times. What can often happen is that you end up with multiplied rows, and the answer isn't "let's just add a DISTINCT somewhere".

The answer is to look at the base table and the joins. You're joining customers to two (implied) one-to-many tables, charges and email_events. If there are multiple charges rows per customer, or an email can match multiple email_events rows, it can lead to a Cartesian multiplication of the rows since any combination of matches from the base table to the joined tables will be included.

If that's the case, the transactions and revenue values are likely to be inflated, and therefore the pretty pictures you passed along to your boss are wrong.

Further reading, and a terrific resource:

https://kb.databasedesignbook.com/posts/sql-joins/#understan...

2 comments

Ok but… nobody said you didn’t had to check either(?).
How do you check if you don't have any other view into the data but SQL and you don't know SQL?
Sql takes at most an afternoon to learn enough of to navigate a database with
Same way you do today; you trust whoever wrote the query.

I do not sell a wrapper on top of some LLM; you can absolutely write your SQL directly. There is an engine, there are iceberg tables. You can just live your best life doing your own SQL by hand.

Now if you couldnt do it before and you have a sensible understanding, you can likely do a bit more with the CLI tooling. And if you know a lot more, you can still do that. The queries are not hidden, or abstracted, If you need them they will be saved - transparently in SQL.

So I dont know what is the answer to the question "how do people do things they don't know how to do" ?

> So I dont know what is the answer to the question "how do people do things they don't know how to do" ?

The statue quo had been to learn SQL or ask a human you trust to check their own work, which hopefully you can reuse.

Now it's ask AIs that are intentionally a bit random, and less likely to (or incapable of) check(ing) their work. Perhaps without seeing the SQL at all, requiring to trust it for every interaction. And in a culture that moves so fast that there is no checking by any(one|thing).

If you think a language model can't check their work, then you are using the tools wrong. Plain and simple.

Modern models are quite capable at surfacing and validating their assumptions and checking correctness of solutions.

Oversight helps you build confidence in the solutions. Is it perfect, no.. but way better then most engineers I also ask to check things.

No they don't. To be able to "check one's work", implies that they can be held accountable, that they can tell apart right from wrong, when in reality they're merely text predictors.

If you think an LLMs can check their work, then you are doing a terrible job at writing software. Plain and simple.

They even go as far as "cheating", so tests fail, writing incorrect tests, or straight out leaking code (lol) like the latest Claude Code blunder. Is this the tool the original comment "is using wrong, plain and simple"? Or do you have access to some other model that works in a wildly different way than generating text predictions?

Yes. The correct answer is to ask an SQL expert to write the query. An LLM will make the expert much faster at writing the script, maybe.
It doesn't.

I can write that script faster than I can write the text asking the AI to write the script as SQL is concise and my IDE has auto-complete.

In fact auto-complete in VS code with copilot works surprisingly well. Let’s say there are patterns in how classes in a framework are constructed, and I already added changes to the schema, it will often emit the correct implementation code when going there. That is really speeding me up even if I know the code base very well.
SQL is not hard enough to require an LLM to think about for you

I will never understand Engineers who struggle with SQL lookups. The vast majority of queries are extremely basic set theory

The harder part is understanding the nature of the data you're working with. There's always some catch ("oh that field `foo` was never backfilled, so for queries before 2020 you have to recompute it by joining with legacyBar instead")
> SQL is not hard enough to require an LLM to think about for you

As someone who's seen queries that are hundreds of lines long, involve a bunch of CTEs, nested SELECTs as well, upwards of a dozen joined tables with OTLT and EAV patterns all over the place (especially the kind of polymorphic links where you get "type" not "table_name" so you also need to look at the app code to understand it), I'd say that SQL can be too hard for people to reason about well.

Bonus points for having to manually keep like 5 Oracle package contents in your working memory cause that's where the other devs on the 10 year old project stored some of the logic, while the remainder is sort-of-dynamic codegen in the app.

Same as with most app code, it shouldn't be like that, but you sometimes get stuff that is really badly developed and the cognitive load (both to inherent and accidental complexity) will increase until people will just miss things and not have the full picture.