Hacker News new | ask | show | jobs
by CuriouslyC 802 days ago
In the research literature, this process is done not by "agent" voting but by taking a similarity score between answers, and choosing the answer that is most representative.

Another approach is to use multiple agents to generate a distribution over predictions, in sort of like bayesian estimation.

3 comments

for my use case (generating an interesting H1), using a similarity score would defeat the purpose.

In this approach, I'm looking for the diamond in the rough. It's often dissimilar from the others. With this approach, the diamond can still get a high number of votes.

That approach definitely has promise. I would have agents rate answers and take the highest rated rather than vote for them though, since you're losing information about ranking and preference gradients with n choose 1. Also, you can do that whole process in one prompt, in case you're re-prompting currently, it's cheaper to batch it up.
For clarification on the first part. The research suggests you can utilize the same prompt over multiple runs as the input to picking the answer.
Any chance you could expand on both of these, even enough to assist in digging deeper into them? TIA.
The TLDR is you can prompt the LLM to take different perspectives than its default, then combine those. If the LLM is estimating a number, the different perspectives give you a distribution over the truth, which shows you the range of biases and the most likely true answer (given wisdom of the crowd). If the LLM is generating non-quantifiable output, you can find the "average" of the answers (using embeddings or other methods) and select that one.
Ah ok, so both are implemented via a call(s) to the LLM, as opposed to a standard algorithmic approach?
Once you have bayesian prior distributions (which it makes total sense for llms to estimate) you can do tons of nifty statistical techniques. It's only the bottom layer of the analysis stack that's LLM generated.