Hacker News new | ask | show | jobs
by andrewmcwatters 1123 days ago
I’m curious what the distribution of karma is on HN, but I haven’t bothered to write a polite scraping method of reading through user’s profiles to grab counts that’s only a little more GET-y than just a human reading HN yet.

I’ve been curious about this for a long time though, already having pulled statistics like distributions of GitHub followers and stargazers.

3 comments

It's pretty old (3+ years) but I did some sophomoric analysis on some HN data [0]. I just added a frequency-karma plot and it looks power law, which should be no big surprise. Using a MLE from John Cook [1], the exponent looks to be about -1.1.

[0] https://abetusk.github.io/yahnda/

[1] https://www.johndcook.com/blog/2015/11/24/estimating-the-exp...

Nice work! This is basically exactly what I was curious about. Thanks for satisfying my itch. Kudos for the db snapshot, too!
> a polite scraping method

Shouldn't be needed, there's an API.

https://github.com/HackerNews/API

Oh thank you, I always forget about this thing.
Karma per account is a power law distribution. You can see the top 100 accounts here:

https://news.ycombinator.com/leaders

Everyone else has less karma than these folks… the long tail.

I haven’t graphed it, but my subjective sense is that my karma per comment is distributed similarly. Most comments get approximately 1 karma point, but occasionally one comment will get dozens or even hundreds of upvotes.

We’ll of course it’s a power law, but that doesn’t tell you anything about whether 5000 karma is the 20th or the 33th percentile.
If I'm querying 'abetusk's SQLite data correctly, 5000 karma would be the 99.64th percentile.

    sqlite> select count(*), sum(karma > 5000), sum(karma <= 5000) from users;
    558905|2018|556887
2018 from 558905 is 0.36%. Using the stats extension from sqlean confirms it.

    sqlite> select percentile(0+karma, 99.64) from users;
    4952.0
    sqlite> select percentile(0+karma, 99.65) from users;
    5070.50800000003