| I think this post underestimates how the degree to which “what data is correct” is deeply contextual. My team created an identical hypothesis to this doc ~2 years ago and generated a proof of concept. It was pretty magic, we had fortune 500 execs asking for reports on internal metrics and they’d generate in a couple of minutes. First week we got rave reviews - followed by an immediate round of negative feedback as we realized that ~90% of the reports were deeply wrong. Why were they wrong? It had nothing to do with the LLMs per se, 03-mini doesn’t do much better on our suite than gpt 3.5. The problem was that knowing which data to use for which query was deeply contextual. Digging into use cases you’d fine that for a particular question you needed to not just get all the rows from a column, you needed to do some obscure JOIN ON operation. This fact was only known by 2 data scientists in charge of writing the report. This flavor or problem - data being messy, with the messiness only documented in a few people’s brains, repeated over and over. I still work on AI powered products and I don’t see even a little line of sight on this problem. Everyone’s data is immensely messy and likely to remain so. AI has introduced a number of tools to manage that mess, but so far it appears they’ll need to be exposed via fairly traditional UIs. |
I can't get anyone to listen to this point. I'm seeing plans going full steam ahead deploying AI when they don't even have a good definition of the PROBLEM much less how to train the AI to do things well and correctly. I was in a 90 minute meeting with some execs who were all high on ChatGPT Operators. He was saying we could replace 80 people at this company RIGHT NOW with this tool. I asked the presenter to type in one simple request to the AI, the entire demo went wildly off the rails from then on and the presenter wasn't even remotely bothered by that. People are either completely taken in by the marketing and believe like it's a religion, or they have solid, sensible concerns about reliability. But the number of people in category 2 is a smaller number than the true believers.