Hacker News new | ask | show | jobs
by ColinWright 4036 days ago
Not necessarily statistically significant, but of the past 1000 articles submitted:

      8 (bbc.co.uk)
      9 (techcrunch.com)
     10 (arstechnica.com)
     14 (nytimes.com)
     14 (theguardian.com)
     16 (washingtonpost.com)
     16 (youtube.com)
     18 (wsj.com)
     32 (medium.com)
     46 (github.com)
Of the past 10,000:

     40 (kickstarter.com)
     40 (reddit.com)
     40 (theatlantic.com)
     44 (forbes.com)
     46 (bloomberg.com)
     46 (securityaffairs.co)
     47 (theverge.com)
     56 (bbc.co.uk)
     62 (washingtonpost.com)
     68 (arstechnica.com)
     69 (bbc.com)
     70 (businessinsider.com)
     74 (wired.com)
     82 (wikipedia.org)
    102 (wsj.com)
    105 (theguardian.com)
    157 (nytimes.com)
    159 (techcrunch.com)
    163 (youtube.com)
    339 (medium.com)
    485 (github.com)
In case you're wondering, I have a file of all submissions listing ID, userid, URL, and title. Then I did this:

    $ tail -n 10000 records   \
        | gawk '{print $NF}'  \
        | sort                \
        | uniq -c             \
        | sort -n             \
        | grep -n .           \
        | tac                 \
        | head -21
2 comments

Wow, thanks Colin - really cool information. Where do you get the contents of 'records' from? Do you have a script that crawls occasionally?

If we merge bbc.com and bbc.co.uk, we end up with 125 / 10,000. I suppose that isn't that many compared to others, but it's still higher than I think it should be. ArsTechnica (which often runs the same articles, such as this SpaceX one) only has 68 / 10,000 and the articles are written with a lot more technical detail.

Nevertheless, I'm not really sure what can be done about it. We can't ban the BBC from HN, as with BuzzFeed, because that's over the top - there's some good content. A nice solution might be to remind people, on the submission page, that it's better to go to the source - or at least a good, technical write-up - rather than a news post that is written for the general public.

  > Where do you get the contents of 'records' from?
I download the "newest" and "news" pages every 15 minutes or so, then collate the data.

  > If we merge bbc.com and bbc.co.uk, we
  > end up with 125 / 10,000.  I suppose
  > that isn't that many compared to others,
It's less than some, but not as many as, say, nytimes.com or techcrunch.com. But it's a major news site, so I'm not surprised people read it and go "Oh, that's interesting, I'll click the HN bookmarklet and submit it."

  > ... still higher than I think it should be.
What do you think it "should be?"

  > A nice solution might be to remind people,
  > ... that it's better to go to the source ...
I personally find it useful to read a non-technical overview, and then if interested, go and find the technical version. Often the article with technical details borders on unreadable.
Could we get some kind of simple mean/variance analysis on "upvotes" by site.

Clearly the BBC isn't being spammed to generate upvotes