| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by suzzer99 38 days ago
	I’ve had to do a ton of SQL stuff lately, which I haven’t really worked with since the late 90s. ChatGPT has been a godsend, not just for me, but for our only coworker who knows SQL well, whom I’d probably be bugging several times a day at my wits’ end. But no one cares about those kinds of productivity gains. Just the ones that will completely replace us.

3 comments

djhn 38 days ago

I find SQL and data(bases) in general to be LLM’s Achilles’ heel. Databases are rarely under version control, so the training data only has one half of the knowledge.

My comments are more in the context of OLAP queries and other non-normalised data often queried via SQL.

I train non-LLM transformer models on (older and rarer) datasets, and automating the ingestion of sprawling datasets with hundreds of columns, often in a variety of local languages and different naming conventions adopted over decades, with quite a few duplicated columns…. The LLMs perform badly, it’s nigh impossible to test (for me as a user in prod) and it’s nearly impossible for the LLM companies to test (in training) to RLVR and RLHF this.

flir 38 days ago

That's interesting - SQL is one of the places I find them the strongest - I think there must be an insane amount of training data out there for SQL. But mostly I'm asking them for ad hoc report queries. Nobody cares if they're bad SQL, they just want to know how many signups there were in March that didn't tick the marketing box. Sounds like you're pushing their capabilities a lot further than I am though - I just want to perform arbitarily complex queries on 3NF data.

timacles 37 days ago

Yeah not sure what this guy is talking about, LLMs excel with queries because the SQL language is pretty small in scope and its easy to test the output. Table structure and relationships are easy to feed to the AI.

> I train non-LLM transformer models on (older and rarer) datasets, and automating the ingestion of sprawling datasets with hundreds of columns, often in a variety of local languages and different naming conventions adopted over decades

All of this sounds like basic data processing

hunterpayne 37 days ago

"Nobody cares if they're bad SQL"

Laid off your DBAs I see.

flir 35 days ago

Ok, ok. Nobody who matters cares if they're bad SQL ;)

Tycho 38 days ago

Just use an LLM to make a good knowledge base for the databases. Based on schema info and production queries. An agent can use that to write queries that work.

hypercube33 38 days ago

I'm the old school type who writes out a document that explains what I plan on doing in markdown even if it's generic like "a window with x and y buttons" and the logic flow and then use that to have ai write a plan with me before I send it off to execute it. This has worked super well.

I do enjoy giving the frontier models wacky projects that I can't even find examples of how to do online but I don't expect any results or need them and some have done really well with it while others fall on their face (models)

skydhash 38 days ago

I'm always amazed by those comments. Why couldn't you buy a book on SQL[0], and spend a week on it? Or just go over to YouTube for a refresher?

[0]: Like https://www.oreilly.com/library/view/sql-queries-for/9780134...

suttontom 38 days ago

I'm amazed you think that instead of using an LLM that someone will go buy a book and spend a week learning something that, judging by the fact that they last used it 30 years ago, likely won't be relevant for them soon.

vintermann 38 days ago

It's not only that I rarely use it, it's also that it's ugly. It's Relational Cobol. It's as loveable as Oracle. The vendor specific dialects don't even agree on how to do recursive queries do they?

Unfortunately I am very good at forgetting things I resented having to learn, and SQL is definitively one of them.

nevertoolate 38 days ago

So you don’t understand what you generate with ai and think that it will be a solution for a problem you can only solve using sql.

vintermann 38 days ago

No, it's easy enough to understand the query once the AI has generated it. I have looked up how to do it many times after all.

nevertoolate 37 days ago

Yes

jazzyjackson 37 days ago

If the AI's query pulled what I intended to pull, why should I care to understand the SQL any more than I should understand the Query Plan or the Machine Code?

marshray 37 days ago

There's nothing wrong with using SQL only when you know in advance exactly which records you wish to query.

But if you ever need to query unknown data, then probably you should learn SQL a bit deeper.

alternatex 37 days ago

As with regex, querying is about not getting what you don't want as much as it is about getting what you want. And the former of the two is much more difficult to verify.

flir 38 days ago

SQL is (was?) one of my strongest skills, I enjoy it a lot, and I still reach for the LLM. It's just faster than me, and when it goes wrong (rarely) I can correct it in plain English.

Daishiman 38 days ago

This is fine for a moderately sized query. When your queries start taking in 8 joins and 20 fields per table because you're running queries on Presto with 5 TB of data, not only is it drastically better at writing (because it doesn't mess up the fields), you can ask it to try the query 5 different ways to help you land on the most optimal.

zelphirkalt 38 days ago

That's exactly where I would expect it to fail somewhere, changing some part of the query every time it writes one.

dcrazy 37 days ago

In my experience, Claude (at least Opus and Sonnet) is pretty good about not misremembering itself.

I think you may be describing the experience of 6-12 months ago.

dingaling 38 days ago

This is a great example of AI tech-debt and fragility.

An eight-join query is going to be nigh on unmaintainable should the requirements change, leading to a change-break-change-break spiral as your preferred coding agent tries to fix its previous fixes.

Maybe the wise way to use AI would be to sort out the schema.

array_key_first 38 days ago

This feels wrong. 8 joins is almost certainly reporting stuff, not transactional. Contrary to what some SQL-averse devs think, 300 lines of SQL is actually more maintainable than the equivalent ~1000 lines of application code. It's also much faster. And I do think that's the real conversion, because SQL is a much higher level language than currently available application languages. It's also declarative in nature, which helps maintainance.

A highly normalized DB can easily end up with 8 joins required for some function. That's really not out of the question. "Sorting out" the schema then would be... denormalization, which is a thing, but you need to know why you're doing it. And I think 8 joins isn't enough of a reason.

darkwater 38 days ago

Yes but developers (or at least web backend developers, who are the ones I interact with the most) are extremely averse to SQL and normalization.

marshray 37 days ago

I think that's what was meant by "reporting stuff, not transactional".

ctxc 38 days ago

When you have a general idea of what smells bad vs what's okay...why?

I'd rather get it from the LLM and review

wartywhoa23 38 days ago

Simple, because books don't earn OpenAI and Anthropic a dime.

timacles 37 days ago

A book on .... SQL? What is this, the 1970s?

te_chris 38 days ago

Extremely weird take.