| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Neywiny 341 days ago
	I have a paper that got denied but it was about using 2AFC sorting to do this instead of elo. It has a defined end unlike elo scores. The code is on my github and focuses on humans sorting images but basically if you have a python sort function, you put your comparison as the key instead of assigning the comparison a numeric score. Then the algorithm does the rest Code: https://github.com/Neywiny/merge-sort Conference/abstract presentation: https://www.spiedigitallibrary.org/conference-proceedings-of...

3 comments

ghita_ 341 days ago

would love to check out the code if you have it!

link

Neywiny 341 days ago

https://github.com/Neywiny/merge-sort

It was actually done to counter Elo based approaches so there's some references in the readme on how to prove who's better. I haven't run this code in 5 years and haven't developed on it in maybe 6, but I can probably fix any issues that come up. My co-author looks to have diverged a bit. Haven't checked out his code. https://github.com/FrankWSamuelson/merge-sort . There may also be a fork by the FDA itself, not sure. This work was done for the FDA's medical imaging device evaluation division

link

reactordev 341 days ago

I was going to mention this approach as well. The problem with the OP is that it has assumption bias and the entire chain is based on that assumption. It’s novel. But the original idea was to more evenly distribute scores so you can find real relevance and I think 2AFC is better. But I don’t have time to verify and post a paper about it.

link

npip99 341 days ago

Yes our pairwise method is based entirely on 2AFC comparisons, for both intra-query and inter-query ELO calculations.

It's definitely the best if not only way to get extremely high signal, and a score assignment that actually converges the more you sample.

In terms of the "F" in 2AFC, we actually have this amusing snippet from our prompt:

> Do NOT output a score of 0.0, ensure to focus on which document is superior, and provide a negative or positive float between -1.0 and 1.0.

link

reactordev 340 days ago

Nice, I use an epoch to prevent stalemate but this might be better.

link

Neywiny 341 days ago

It's probably because that's what we used, but nAFC has been my go-to since I first learned about it. Literally any time there's a ranking, even for dumb stuff like tier list videos on YouTube, they're too arbitrary. Ok you ranked this snack an 8/10. Based on what? And then they go back and say "actually I'm going to move that to a 7". AFC fixes all of that.

link

fc417fc802 339 days ago

I would have been curious to glance at the paper (poster? whatever it is) but it's paywalled. Is there any particular reason it isn't on arxiv?

link