| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Magmalgebra 497 days ago

I think this post underestimates how the degree to which “what data is correct” is deeply contextual.

My team created an identical hypothesis to this doc ~2 years ago and generated a proof of concept. It was pretty magic, we had fortune 500 execs asking for reports on internal metrics and they’d generate in a couple of minutes. First week we got rave reviews - followed by an immediate round of negative feedback as we realized that ~90% of the reports were deeply wrong.

Why were they wrong? It had nothing to do with the LLMs per se, 03-mini doesn’t do much better on our suite than gpt 3.5. The problem was that knowing which data to use for which query was deeply contextual.

Digging into use cases you’d fine that for a particular question you needed to not just get all the rows from a column, you needed to do some obscure JOIN ON operation. This fact was only known by 2 data scientists in charge of writing the report. This flavor or problem - data being messy, with the messiness only documented in a few people’s brains, repeated over and over.

I still work on AI powered products and I don’t see even a little line of sight on this problem. Everyone’s data is immensely messy and likely to remain so. AI has introduced a number of tools to manage that mess, but so far it appears they’ll need to be exposed via fairly traditional UIs.

10 comments

burnte 497 days ago

> I think this post underestimates how the degree to which “what data is correct” is deeply contextual.

I can't get anyone to listen to this point. I'm seeing plans going full steam ahead deploying AI when they don't even have a good definition of the PROBLEM much less how to train the AI to do things well and correctly. I was in a 90 minute meeting with some execs who were all high on ChatGPT Operators. He was saying we could replace 80 people at this company RIGHT NOW with this tool. I asked the presenter to type in one simple request to the AI, the entire demo went wildly off the rails from then on and the presenter wasn't even remotely bothered by that. People are either completely taken in by the marketing and believe like it's a religion, or they have solid, sensible concerns about reliability. But the number of people in category 2 is a smaller number than the true believers.

lfxyz 496 days ago

> People are either completely taken in by the marketing and believe like it's a religion, or they have solid, sensible concerns about reliability.

The other issue is that the first group are labelled as innovative go-getters, while the second group are labelled as negative crusty curmudgeons and this has an impact on the careers of both groups.

OccamsMirror 497 days ago

Executives are salivating at the opportunity to cut 80 staff with no repercussions. They're so eager they're keeping the blinkers on. The fact that the AI has no clothing is ignored because the promise is so attractive.

burnte 496 days ago

Yep. He said any company that isn't all-in on AI will be out of business next year. I really want to see ChatGPT Operators fix my roof.

satvikpendem 496 days ago

> I can't get anyone to listen to this point. I'm seeing plans going full steam ahead deploying AI when they don't even have a good definition of the PROBLEM much less how to train the AI to do things well and correctly.

First time? I did AI work years before the current generative AI boom and it was the same then too, managers wanted to stick AI into everything without even knowing what the hell they actually wanted in the end.

burnte 496 days ago

First time with AI, yeah, but I am not surprised since I've seen it with every other tech fad. People come up with solutions to implement, not problems to solve.

sansseriff 497 days ago

It will be interesting to see in what fields it's worth the effort to curate you're data to a high enough standard that you get all the benefits of the ai agent.

I'm currently working as a scientist. I wonder if researchers will be willing to annotate their papers, data, reasoning, and arguments well enough that ai agents can make good use if it all.

If you write your papers in an AI friendly way, maybe that means more citations? Does this mean switching to new publishing formats? Pdfs are certainly limiting

aeturnum 497 days ago

I think a lot of the power and capability of LLMs comes from their understanding of a lot of implicit context in language. But generally LLMs will have a dominant understanding of each linguistic construct and if that understanding is isn't correct they struggle.

We've looked at using agents at my current job but most of the time, once the data is properly structured, a more traditional approach is faster and less expensive.

TaurenHunter 497 days ago

That must be the reason why Palantir and other AI companies are using the concept of "ontology".

We can't let a LLM loose on a database and expect it to figure out everything.

joshstrange 496 days ago

Yep, I had a similar experience around a year or so ago. Hooking an LLM up to my RDMBS was really cool for the first 1-2 questions but fell over almost immediately with questions that strayed much further than “how many rows are in this table”.

Sure, you can do some basic filtering (but it would fail here making bad assumptions) and any (correct) joins were a crap-shoot. I was including schema and sample rows from all my tables, I wrote 10’s of lines of instructions explaining the logic of the tables and that still didn’t begin to cover all the cases.

Prompt engineering tons of business logic is a horrible job. It hard to test and it feels so “squishy” and unreliable. Even with all of my rules, it would write queries that didn’t work and/or broke a rule/concept that I had laid out.

In my experience, you’re much better off using AI to help you write some queries that you add to the codebase (after tweaking/checking) then you are having AI come up with queries at run time.

yibg 497 days ago

Completely agree. Even things that are considered "standard" or "basic" some times have deep contextual variances. For instance basic questions like "what is my ARR this month" can have varying answers for different businesses.

lukev 497 days ago

This is absolutely the problem. But there is a line of sight; namely, combining LLMs with existing semantic data technologies (e.g, RDF.)

This is why I'm building a federated query optimizer: we want to let the LLM reason and formulate queries at the ontological level, with query execution operating behind a layer of abstraction.

Magmalgebra 497 days ago

Unfortunately this doesn't address the problem I'm describing.

My team had these ontologies available to the LLM and provided it in the context window. The queries were ontologically sensible at a surface level, but still wrong.

The problem is that your ontology is rapidly changing in non-obvious and hard to document ways e.g. "this report is only valid if it was generated on a tuesday or thursday after 1pm because that's when the ETL runs, at any other time the data will be incorrect"

creaghpatr 497 days ago

The analysts know where the bodies are buried, so to speak. The execs may not even be aware there are bodies.

jaennaet 497 days ago

This got me curious as to what "queries at the ontological level" means in concrete terms. It's been a good long while since I did anything even remotely data engineering -like, and back then "AI" could be something like a support vector machine (yay moving goalposts), so I haven't had to deal with this sort of stuff at all.

abakker 497 days ago

Line of sight to a problem solving architecture, while cool, is nowhere near line of sight on upgrading the existing crappy data that is critically intertwined with literal thousands of apps in a typical enterprise.

SoftTalker 497 days ago

Did the execs immediately recognize that the reports were wrong, or did some analyst working in a cubicle on the 9th floor point that out?

Magmalgebra 497 days ago

Ususally the analyst, but sometimes the exec - hard to miss when a report implies your revenue has shifted 90%+ in either direction since the last time you read a report :)

llm_trw 497 days ago

The intern under the analyst did.

llm_trw 497 days ago

>Digging into use cases you’d fine that for a particular question you needed to not just get all the rows from a column, you needed to do some obscure JOIN ON operation. This fact was only known by 2 data scientists in charge of writing the report.

>I still work on AI powered products and I don’t see even a little line of sight on this problem. Everyone’s data is immensely messy and likely to remain so.

I've worked in the space as well and completely unstructured data is better than whatever you call a database with a dozen ad hoc tables each storing information somewhat differently to each other for reports written by a dozen different people over a decade.

I have a benchmark for an agentic system which measures how many joins between tables the system can do before it goes off the rails. But there is nothing off the shelf that does it and for whatever reason no one is talking about it in the open. But there are companies working to solve it in the background - since I've worked with three so far.

Without documentation giving some grounding about what the table is doing, you're left with hoping the database is self documenting enough for the agent to figure out what the column names mean and if joining on them makes sense - good luck doing it on id1, id2, idCustomerLocal, id_customer_foreign though.

Magmalgebra 497 days ago

Descriptions of tables is insufficient (we had it) - you also need descriptions of the systems writing to the tables.

My favorite example was a report that was only accurate if generated on a Tuesday or Thursday due to when the ETL pipeline ran. A small config change on the opposite side of a code base completely altered the semantics of the data!

llm_trw 497 days ago

If you're interested please drop an email. I've only worked deeply with pipelines extracting data from documents and I'd be interested in hearing what the challenges with databases are.

mooreds 496 days ago

> Digging into use cases you’d fine that for a particular question you needed to not just get all the rows from a column, you needed to do some obscure JOIN ON operation. This fact was only known by 2 data scientists in charge of writing the report. This flavor or problem - data being messy, with the messiness only documented in a few people’s brains, repeated over and over.

This reminds me of one of the key plot points in "The Sparrow" by Mary Doria Russell. Small spoiler ahead so if you haven't read it and want to be surprised, stop reading.

...

...

Basically, one of the characters works as an AI implementer, replacing humans in their jobs by learning deeply about how they do their work and coding up an AI replacement. She run across a SETI researcher and works on replacing him, but he has a human intuition when matching signals that she would never have discovered because it was so random.

Great book if you haven't read it.