|
|
|
|
|
by efskap
29 days ago
|
|
It reminds me of how LLM hallucination is attributed to "I don't know" being underrepresented in training data, and it being a better strategy to guess on evaluations rather than admit not knowing. Different reward function, but the same behaviour emerges. |
|
The idea is that you generate fake llm transcripts using your classical training data. E.g. look at some training data, generate q/a transcripts. Generate radom questions, RAG against your whole dataset and look for relevant stuff, if there is nothing there, train a "I don't know." reply.
A moderately sized LLM operating some tools to access more information behind the scenes, perform tests and correct its own errors can write transcripts simulating a much larger and smarter llm.