Hacker News new | ask | show | jobs
by bberenberg 483 days ago
I was actually looking at building something like this for analyzing different AI generated voices. I ended up going down a statistics rabbit hole to understand how to reduce the total number of comparisons to be made while still getting a good result. Have you considered how your tool can work with different metadata across the comparison set to reduce total comparisons needed? Also accepting a CSV similar as an input?
2 comments

This really is a deep rabbit hole and something I've played around with and considered devoting a lot of time to. Look into Expert Elicitation, Decision Theory and Order Theory.

There is no one-size fits all. This the most important thing to keep in mind from the start.

This type of ranking is really all about UX. The math is just a tool to make it easier. It's a real trap to find some theory and think this will solve things, but if it doesn't actually make it easier for people to make decisions, you really didn't solve the problem.

Sometimes it looks like stack ranking would help. But, often you don't really need a stack. Maybe you just need the top one or the top N. Maybe each item has a weight and you want to fit the most value for a given weight allocation (knapsack problem). Maybe the weights and values aren't actually known, just relatively (this one is more work and more valuable than that one). Maybe value is compounding, like u({A, B}) > u({A}) + u({B}).

Maybe the preferences are circular, like A > B > C > A. But that's not possible! Well, that's what the user says and just throwing up an error screen probably won't fix it. You'll need to handle that gracefully.

My suggestion is to really stick to one specific problem and solve for that, versus something general. Also allow the input to be rich. Rather than a win/lose, you might be better off with -2, -1, 0, +1, +2 in comparison (or words). Allow ties until they're actually a problem. Why make people struggle to choose between two options when neither of them end up being used?

It can also help to see things as probabilistically better rather than strictly better. Elo scores help with this, like the other comment said.

Decision ability is a resource. Decision fatigue is real and fast. Optimize for taking up as little as that as possible from the user, especially if that user is you.

I don't know how OP does it, but Elo ratings are a pretty good tool for this kind of thing. In each pairwise comparison, the selected option "wins", the other option "loses", and ratings are adjusted accordingly.

You can incorporate priors by setting initial ratings differently, or force correlations between items by treating a win against option X as something like 0.8 wins against option X and 0.2 wins against options correlated with X.