|
|
|
|
|
by reactordev
337 days ago
|
|
I was going to mention this approach as well. The problem with the OP is that it has assumption bias and the entire chain is based on that assumption. It’s novel. But the original idea was to more evenly distribute scores so you can find real relevance and I think 2AFC is better. But I don’t have time to verify and post a paper about it. |
|
It's definitely the best if not only way to get extremely high signal, and a score assignment that actually converges the more you sample.
In terms of the "F" in 2AFC, we actually have this amusing snippet from our prompt:
> Do NOT output a score of 0.0, ensure to focus on which document is superior, and provide a negative or positive float between -1.0 and 1.0.