| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ColinWright 4083 days ago

Not necessarily statistically significant, but of the past 1000 articles submitted:

      8 (bbc.co.uk)
      9 (techcrunch.com)
     10 (arstechnica.com)
     14 (nytimes.com)
     14 (theguardian.com)
     16 (washingtonpost.com)
     16 (youtube.com)
     18 (wsj.com)
     32 (medium.com)
     46 (github.com)

Of the past 10,000:

     40 (kickstarter.com)
     40 (reddit.com)
     40 (theatlantic.com)
     44 (forbes.com)
     46 (bloomberg.com)
     46 (securityaffairs.co)
     47 (theverge.com)
     56 (bbc.co.uk)
     62 (washingtonpost.com)
     68 (arstechnica.com)
     69 (bbc.com)
     70 (businessinsider.com)
     74 (wired.com)
     82 (wikipedia.org)
    102 (wsj.com)
    105 (theguardian.com)
    157 (nytimes.com)
    159 (techcrunch.com)
    163 (youtube.com)
    339 (medium.com)
    485 (github.com)

In case you're wondering, I have a file of all submissions listing ID, userid, URL, and title. Then I did this:

    $ tail -n 10000 records   \
        | gawk '{print $NF}'  \
        | sort                \
        | uniq -c             \
        | sort -n             \
        | grep -n .           \
        | tac                 \
        | head -21

2 comments

kaolinite 4083 days ago

Wow, thanks Colin - really cool information. Where do you get the contents of 'records' from? Do you have a script that crawls occasionally?

If we merge bbc.com and bbc.co.uk, we end up with 125 / 10,000. I suppose that isn't that many compared to others, but it's still higher than I think it should be. ArsTechnica (which often runs the same articles, such as this SpaceX one) only has 68 / 10,000 and the articles are written with a lot more technical detail.

Nevertheless, I'm not really sure what can be done about it. We can't ban the BBC from HN, as with BuzzFeed, because that's over the top - there's some good content. A nice solution might be to remind people, on the submission page, that it's better to go to the source - or at least a good, technical write-up - rather than a news post that is written for the general public.

link

ColinWright 4083 days ago

  > Where do you get the contents of 'records' from?

I download the "newest" and "news" pages every 15 minutes or so, then collate the data.

  > If we merge bbc.com and bbc.co.uk, we
  > end up with 125 / 10,000.  I suppose
  > that isn't that many compared to others,

It's less than some, but not as many as, say, nytimes.com or techcrunch.com. But it's a major news site, so I'm not surprised people read it and go "Oh, that's interesting, I'll click the HN bookmarklet and submit it."

  > ... still higher than I think it should be.

What do you think it "should be?"

  > A nice solution might be to remind people,
  > ... that it's better to go to the source ...

I personally find it useful to read a non-technical overview, and then if interested, go and find the technical version. Often the article with technical details borders on unreadable.

link

Ntrails 4083 days ago

Could we get some kind of simple mean/variance analysis on "upvotes" by site.

Clearly the BBC isn't being spammed to generate upvotes

link