| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ileitch 5168 days ago

I assume each vector has its own weight? So better in "Better in both respects" is a stronger sign of similarity than just "Higher quality but same rewatchability."

So say.. "Same in both dimensions" = 0 "Same quality but more rewatchable." = +1 "Same quality but less rewatchable." = -1 "Higher quality but less rewatchable." = +2 "Higher quality but same rewatchability." = +3 "Better in both respects." = +4 etc..

Then you could pass those to a coefficient like Pearson's R.

x = [0, 1, 2, -1, -3, 4, -4] y = [0, 1, 1, 2, -1, -2, 0]

It'd be an interesting experiment to see what results that gives vs. your current algorithm.

1 comments

geelen 5168 days ago

That's something we haven't tested, but my gut tells me contrasting Vector Victors (like better in both dimensions) is 'worth' more than similar Vector Victors.

The really significant change would be that agreeing in one dimension (yes A is better quality than B, but we disagree on which is more rewatchable) still contributes to your correlation with someone. We're not doing that at the moment, because it felt like pairwise partial agreement would weaken the signal - I wanted _real_ agreement (in both dimensions) to stand out.

While there might be a way to capture that with a linear function, I've favoured solutions that reflect that our ratings are two-dimensional.

link

ileitch 5168 days ago

Also, if you avoid the normalisation step you could easily factor in the degree at which user A liked the quality vs. user B, instead of just a 'more' or 'less' question.

If you factor your vector weights by the scale of your quality rating (0 - 10?) then if user A liked the quality film X vs. film Y +6 more points than user B's +1, this would give you a more accurate correlation.

Anyway, food for thought. A very fun problem to be working on!

link

ileitch 5168 days ago

I think the weightings I describe above would give you that.

Say we start at 0 and user A likes the next move in both directions (+4) and user B only likes it more in one direction and the same in the other (+2) then you're still going to get a positive correlation, just a slightly lower one than if both users liked it in both directions.

R([0, 4, 4], [0, 4, 4]) = 1.0

R([0, 4, 4], [0, 2, 4]) = 0.852

R([0, 4, 4], [0, -4, -4]) = -0.9

link