Hacker News new | ask | show | jobs
by mjmahone17 4578 days ago
This is interesting, but given your parameters (predict the most friendships), all you're technically asking for is recall. I'll write an algorithm that has 100% recall: predict that all people become friends with each other.

If this is really a competition (and not just "Here, have fun with our dataset!"), you need to define the rules a little bit more clearly. How are you weighing recall vs. precision? Or are you just looking at % correct labels, where the only two labels possible are "FRIENDS" and "NOT FRIENDS"?

2 comments

Sorry this was unclear. We meant "correctly predict the most friendships"

You get 1 point for each friendship that you correctly predict did or did not occur. In the test data set ~50% of pairs became friends, so predicting "everyone became friends" would get 250 points, whereas a perfect algorithm would get 500 points.

I'm updating the README now to make our scoring system more clear.

They're also looking for whether people become friends on Facebook.

The dominant factor here is going to be the rate at which the participants send and accept connection requests on Facebook. Some people send them to everyone they meet, some people never use Facebook.

KPI overfitting, yay!

(The best second-order effect is probably a multi-feature similarity measure between the participants and the person's current Facebook Friends, including graph distance to current Friends. In case anyone is taking a run at this.)