| > potential applications
> if you ...
> for example ... Yes there seems to be lots of potential. Yes we can brainstorm things that should work. Yes there is a lot of examples of incredible things in isolation. But it's a little bit like those youtube videos showing amazing basketball shots in 1 try, when in reality lots of failed attempts happened beforehand. Except our users experience the failed attempts (LLM replies that are wrong, even when backed by RAG) and it's incredibly hard to hide those from them. Show me the things you / your team has actually built that has decent retention and metrics concretely proving efficiency improvements. LLMs are so hit and miss from query to query that if your users don't have a sixth sense for a miss vs a hit, there may not be any efficiency improvement. It's a really hard problem with LLM based tools. There is so much hype right now and people showing cherry picked examples. |
This has been my team's experience (and frustration) as well, and has led us to look at using LLMs for classifying / structuring, but not entrusting an LLM with making a decision based on things like a database schema or business logic.
I think the technology and tooling will get there, but the enormous amount of effort spent trying to get the system to "do the right thing" and the nondeterministic nature have really put us into a camp of "let's only allow the LLM to do things we know it is rock-solid at."