|
|
|
|
|
by athrowaway3z
4 days ago
|
|
> LLMs routinely fail at our business specifics: Local tax regulations, particularities of the accounting process, specifics of our ledger implementations. So there is a spectrum here, and i dont know what i dont know - meaning i can just be wrong. But we're both on that spectrum and are you sure its not a skill issue? All of the specifics you list seems so fundamental that in similar projects I've inserted them straight into the AGENTS.md or a strong reference and where to look them up. If you boil it down to it, you're quite literally saying the problem is the LLMs dont have access to a bunch of facts. |
|
Well yeah, And our problem with mortality is not having access to a bunch of medical facts :)
Kidding aside, it's a fair question. We have several problems with that approach:
One, today they miss X tomorrow Y. You can iteratively add information and get better, but everyone who's had to keep a large company's documentation updated and consistent knows how absolutely hard of a problem that is. Still, this is not the main issue.
Two, knowledge extraction is not clean. We face this daily. "There was no incident on may 12" could mean any of:
- "There was no incident on may 12"
- "There was an incident I was not aware of"
- "There was an incident, but I'm a contractor who has to pay if there's an incident so I'm not admitting shit"
- "There may have been an incident, who knows, I secretly told chatgpt to handle this task for me"
- "Something went wrong but I don't consider it an incident because that particular error has been popping up every wednesday since I joined the company and I was told to ignore it"
- "there was an incident when I touched something you told me not to touch so I will firmly deny there was an incident"
You won't get the LLM to navigate that human problem. You might think that's tech debt and dysfunctionality, but it is real life. It's the same problem as with self driving cars, it's semi easy until you introduce toddlers running after a ball in the middle of a road, drunk drivers and unfixed potholes.
Three, and this is the main issue, surfacing. Skills, agents, etc work for obvious connections like "I'm writing a test => we test with framework x in a style y". they do not work as well for indirect connections like: "If I correct the amount of these past payments' insterests, for a minority of them it might raise the total amount above a certain threshold where we were supposed to have required extra information due to money laundering regulations, and I need to contact legal to see what we do since it's not possible to request the extra info after the fact"
The problem is that the set of things to potentially surface is giant and LLM's fail miserably at connecting what to surface where. It's what we usually refer to as the "spidey sense"/"shitdar" of senior engs. LLMs might get better with time, but so far the ability isn't there.