| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by geelen 5167 days ago

It's not actually linear correlation, since we effectively normalise the pairwise scores to [-1,0,1],[-1,0,1] (nine possible combos). We're exploring blending in a few other signals along the way, but we wanted to see how far we could get by discretising the pairwise comparisons in this way.

Once we've collapsed all pairs down to a Vector Victor, we treat matching Vector Victors as a thumbs up and non-matching as a thumbs down, take the square root of both then take the lower bound of the Wilson interval as our ranking function.

More questions? Shoot!

1 comments

ileitch 5167 days ago

I assume each vector has its own weight? So better in "Better in both respects" is a stronger sign of similarity than just "Higher quality but same rewatchability."

So say.. "Same in both dimensions" = 0 "Same quality but more rewatchable." = +1 "Same quality but less rewatchable." = -1 "Higher quality but less rewatchable." = +2 "Higher quality but same rewatchability." = +3 "Better in both respects." = +4 etc..

Then you could pass those to a coefficient like Pearson's R.

x = [0, 1, 2, -1, -3, 4, -4] y = [0, 1, 1, 2, -1, -2, 0]

It'd be an interesting experiment to see what results that gives vs. your current algorithm.

link

geelen 5167 days ago

That's something we haven't tested, but my gut tells me contrasting Vector Victors (like better in both dimensions) is 'worth' more than similar Vector Victors.

The really significant change would be that agreeing in one dimension (yes A is better quality than B, but we disagree on which is more rewatchable) still contributes to your correlation with someone. We're not doing that at the moment, because it felt like pairwise partial agreement would weaken the signal - I wanted _real_ agreement (in both dimensions) to stand out.

While there might be a way to capture that with a linear function, I've favoured solutions that reflect that our ratings are two-dimensional.

link

ileitch 5167 days ago

Also, if you avoid the normalisation step you could easily factor in the degree at which user A liked the quality vs. user B, instead of just a 'more' or 'less' question.

If you factor your vector weights by the scale of your quality rating (0 - 10?) then if user A liked the quality film X vs. film Y +6 more points than user B's +1, this would give you a more accurate correlation.

Anyway, food for thought. A very fun problem to be working on!

link

ileitch 5167 days ago

I think the weightings I describe above would give you that.

Say we start at 0 and user A likes the next move in both directions (+4) and user B only likes it more in one direction and the same in the other (+2) then you're still going to get a positive correlation, just a slightly lower one than if both users liked it in both directions.

R([0, 4, 4], [0, 4, 4]) = 1.0

R([0, 4, 4], [0, 2, 4]) = 0.852

R([0, 4, 4], [0, -4, -4]) = -0.9

link