Hacker News new | ask | show | jobs
by resource_waste 778 days ago
I feel like this is the perfect application of running the data multiple times.

Imagine having ~10-100 different LLMs, maybe some are medical, maybe some are general, some are from a different language. Have them all run it, rank the answers.

Now I believe this can further be amplified by having another prompt ask to confirm the previous answer. This could get a bit insane computationally with 100 original answers, but I believe the original paper I read was that by doing this prompt processing ~4 times, they got to some 95% accuracy.

So 100 LLMs give an answer, each time we process it 4 times, can we beat a 64 year old doctor?

1 comments

Unfortunately I don't believe that accuracy will scale "multiplicitively". You'll typically only marginally improve beyond 95%... and how much is enough?

Even with such a system, which will still have some hallucination rate, adding Deterministic Quoting on top will still help.

It feels to me we are a long way off LLM systems with trivial rates of hallucination

a 95% diagnosis rate would be insane.

I believe I read doctors are only at like 30%...