Hacker News new | ask | show | jobs
by karpathy 3887 days ago
I screwed up on this point by the way - I had done this part of the experiment a few months ago and I incorrectly remembered the details. I went back and looked through the code and adjusted the post with more regarding this important point. In particular:

"Now it is time to decide which ones of those selfies are good or bad. Intuitively, we want to calculate a proxy for how many people have seen the selfie, and then look at the number of likes as a function of the audience size. I took all the users and sorted them by their number of followers. I gave a small bonus for each additional tag on the image, assuming that extra tags bring more eyes. Then I marched down this sorted list in groups of 100, and sorted those 100 selfies based on their number of likes. I only used selfies that were online for more than a month to ensure a near-stable like count. I took the top 50 selfies and assigned them as positive selfies, and I took the bottom 50 and assigned those to negatives. We therefore end up with a binary split of the data into two halves, where we tried to normalize by the number of people who have probably seen each selfie. In this process I also filtered people with too few followers or too many followers, and also people who used too many tags on the image."

1 comments

Still no men in the top 100 ? There must be something deep to learn about the difference in sexes there, I am just not sure what it is.