Hacker News new | ask | show | jobs
by clay_the_ripper 787 days ago
I think this fundamentally misunderstands how to use LLMs. Out of the box, an LLM is not an application - it’s only the building blocks of one. An application could be built that answered this question with 100% accuracy - but it would not solely rely on what’s in the training data. The training data makes it “intelligent” but is not useful for accurate recall in this way. Trying to fix this problem is not really the point - this shortcoming is well known and we have already found great solutions to it.
1 comments

What are the solutions?

As pointed out in the article, some LLM's appear to know the information when requested to list episodes, then deny it later. These are general inconsistencies.

It is not about looking up trivia, it is the fact you never know the competence level of any answer it gives you.

I think what the parent poster meant is that the most useful way to use today's LLMs is to accept their limitations and weaknesses and work around them. Better models will come, but for now this is what you have to do.

For example, use LLMs to transform text rather than generate it from scratch (where they are prone to hallucinate). General purpose chat-bot is not a great use case!

For this particular Gilligan's Island task it'd be better to first retrieve the list of episode titles (or descriptions if that was needed), then ask the LLM which of them was about "mind reading". There are various ways to do this sort of thing, depending on how specific/constrained the task is you are trying to accomplish. In the most general case you could ask a powerful model like Claude Opus to create a plan composed out of simpler steps, but in other cases your application already knows what it wants to do, and how to do it, and will call an LLM as a tool for specific steps it is capable of.