Hacker News new | ask | show | jobs
by galaxyLogic 18 days ago
I don't quite get it why they can't take another LLM and vet the output of the first with the second one. Surely they would not have the same hallucinations and would be able to detect hallucinations of the earlier LLM. Maybe it would cost too much in terms of tokens?

I don't know but I would expect it to be realtively easy for an LLM to detect "hallucinations".

5 comments

>I don't quite get it why they can't take another LLM and vet the output of the first with the seond one.

I think this may be part of the problem. The actual humans creating the report don't have the expertise to know which one to trust. At least that was what consulting was like in my experience at a similar firm.

> I don't quite get it why they can't take another LLM and vet the output of the first with the second one.

Yes, this technique and its variations[1][2] "work" but it's still not 100% perfect. And it's not as widely used it might be because, among other reason:

a. it takes longer to implement

b. it costs more (more tokens spread across multiple llm calls)

c. higher latency (getting an answer takes longer due to multiple llm calls involved)

d. the final answer is probabilistically more likely to be correct, but is still not guaranteed to be error free, so you can never fully escape the need for Human in the Loop.

[1]: https://en.wikipedia.org/wiki/LLM-as-a-Judge

[2]: https://github.com/karpathy/llm-council

I think the AIs don't have enough information about the problem. There's many things those who wrote the prompts forgot to mention. And some of it maybe is tacit knowledge?

Then, it doesn't matter if you add 1000 frontier models -- they still can't generate a good report.

But yes I suppose you can get rid of hallucinated citations though

I am not exactly sure if this would solve the overall problem. The main one being lack of oversight. The solution to a social issue generally isn’t to throw more technology at it.
IBM once said “a computer can never be held accountable. Therefore a computer must never make a management decision”
Because they used LLMs to do the work. What you are suggesting is to use the LLMs to create more work, which is counter to the shortcut they were trying to take.
Good point with some irony. Thye don't want to do a better job they want to do an easier job. But a company like E&Y should realize shortcuts like these don't work. And their customers are paying them.