This seems like a dramatic illustration of cumulative advantage, as studied in the famous MusicLab experiment [1, 2]. The argument is that in cultural and social markets, random effects govern which products or artifacts get the initial few "upvotes" (or their analogs), at which point the rich-get-richer dynamic takes over. Very, very awesome.
The work isn't complete yet (always more to do) but the TL;DR is:
1. Yes, randomness governs a lot of article outcomes. Whether something hits the front page or not is pretty arbitrary.
2. However, conditioned on making the front page, popularity is actually a good reflection of "intrinsic quality". I think the ultimate relationship between popularity and quality is stronger than the MusicLab experiment suggests.
I like this a lot! If I allow myself explanation, I'd say your findings make a lot of sense, but I will instead heed the title of Duncan Watts' latest, "Everything is Obvious*: Until You Know the Answer" :)
I wonder how much engagement you'd get if you made a browser plugin or even an alternative website that showed users a random selection from the top three pages of reddit/HN, then intercepted and logged their upvotes, to get a direct measure of intrinsic quality, rather than estimating this statistically. I for one would use such an interface.
On a sidenote. Have you seen this work in predicting the growth of ongoing cascades from Facebook [1]? I'm fixing to see if their findings apply to the MusicLab data.
Thanks for the feedback (and the Watts' reference). Building a plugin is definitely an interesting idea; I hadn't thought of that before. I guess the problems would be three-fold. First, I don't know how to do that :-). Second, there are probably a bunch of ethical concerns/IRB issues that would stand in the way of academic publishing (but thats not huge). Third, and the only fundamental issue, is that the self-selection into using that plug-in would bias the estimates of intrinsic quality. Still its a pretty good idea but I'm currently trying to get access to more fine-grained data in other ways, so we'll see.
In terms of creating your own site, a few researchers [1,2] have already done this and have some interesting work. But even with that, you still have the problem of accounting for position bias within the site (like HN doesn't really know if you skimmed the title of an article and decided to ignore it, or never read the title at all). But the experimental power you get with that is pretty cool.
And I have totally read that Facebook cascades paper and have more than a few thoughts about it. In fact, I have adapted their prediction-style results to the MusicLab data and you get really strong predictive accuracy (like 90-95% in terms of predicting whether a song will eventually be above the median popularity). However the accuracy you would achieve on Reddit or Hacker News data is considerably lower. I didn't really include those results in my paper because I'm not sure how they fit yet.
If I had one critique (which is not really a critique but a comment) is that the Facebook study doesn't really contradict Watts' point that popularity is hard to predict. The Facebook study shows that if you can observe the "initial conditions", then you can predict eventual outcomes pretty well but that's directly in line with the rich-get-richer effect that Watts et el demonstrate. To put it pithily, its easy to predict who gets richer if you observe who is rich.
Anyway, I could geek out about this for a long time but feel free to drop me an email at stoddardg [at] gmail.com if you're interested in chatting some more.
(This post sounds a bit too negative but it's not meant to be. Sorry for my poor communication style. Thanks for the informative post.)
> People can vote stories up or down (posts get +1 for an upvote, -1 for a downvote)
People can't downvote submissions. They can flag submissions and they can flag as well as downvote comments.
> It’s not entirely clear how karma is assigned but it’s safe to assume this is a measure of status.
Each point of karma is a single upvote on a submission or a comment. My high karma is purely a result of very frequent commenting, and has nothing to do with status. (Other people's high karma combined with high average karma is probably an indicator that people respect them. Some people have a weirdly low average - I don't understand how ColinWright only has an average of 1.7 or rayiner only has an average of 2.7 for examples).
A downvoted comment will reduce your karma by one point for each downvote. Flagged comments don't reduce your karma but may have other effects. A while ago I had a comment that got 50 downvotes - all of those were taken off my karma total. There was a problem where a controversial post might be flagkilled - limiting the total loss to whatever downvotes that post gets before it's flagkilled; or a controversial post just gets heavily downvoted but not flagkilled which means unlimited downvotes for the time that downvotes are available on that post. (Not sure if this has been changed yet, or if it's by design.)
> Karma isn’t meaningless. While plenty of users with low karma submitted posts that ascended to the front page, it looks like posts submitted by influencial users get to the front page more quickly (though the correlation is pretty weak).
Very few of my submitted articles make the front page. I'm tempted to say something about correlation and causation here - people who submit articles that make the front page get lots of karma, rather than users with lots of karma get front pages more easily.
About points per minute: There are some anti-gaming algorithms that will penalise a submission if it gets too many votes too quickly. I have no idea what the correct rate is.
There is also, I think, some algorithm that will detect many comments by new accounts. Announcing a post on social media is a mistake that some people make. New accounts then visit and upvote the submission, which demotes rather than promotes it.
The effect "High karma can give you a little boost" is not due to HN karma itself. Many contributors to HN already have a reputation for good content and thus people are more likely to read/vote new submissions from them. So the boost is earned by the contributors by having impressed HN audience, not granted by HN solely based on past karma.
Oh right, so it really is a calculation of points vs age, according to the formula, plus penalties -- which my analysis didn't catch. Thanks for the background.
This is a nice analysis, although the causes of a high points per minute score could presumably be related to whether it gained an early vote / comment from a user with high karma (who is effectively giving it a stamp of approval).
This only had two points on 20 minutes when I voted for it though, so by their own logic is unlikely to make it to the FP!
Hah, good point! Much to my chagrin, the theory I proposed is challenged by my own post. It looks like someone with a lot of karma can boost a post to the front page (there are some outliers in the analysis). Maybe we can get individual votes exposed via the API to know for sure.
This post is interesting but it falls short, ultimately if you can get a few different people to upvote your submissions you can exploit HN. You can see we're being marketed at, often, by startups that are routinely on the front page with content that is irrelevant to their own customers.
[1] Salganik, Dodds, Watts, "Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market", 2006: https://www.princeton.edu/~mjs3/salganik_dodds_watts06_full....
[2] A popular article by one of the authors, the inestimable Duncan Watts: http://www.nytimes.com/2007/04/15/magazine/15wwlnidealab.t.h...