Hacker News new | ask | show | jobs
by tdedecko 5995 days ago
Can you provide more information about how the sampling was done and how you categorized the articles?
2 comments

The graph also needs better labelling. For eg. What is X-Axis? is that snapshots over time?
The x axis is the rank number of the posting divided by 1000, so that's a constant sampling interval in blocks of 1,000 but more compressed in time towards the right because of the higher posting frequency.
It would make more sense to use a constant time x-axis. Also, how did you do the article labeling/clustering?
It wouldn't make much difference actually, apart from greatly complicating the matching up of the Y axis.

The bigger issue is the fact that this is just everything that is posted and not flagged, so it is if you wish a view of the 'new' page, it has nothing to do with the 'home' page, I'll try to address that tomorrow.

As for the labelling and clustering, that was based on keywords in the title from a fair sized sample, and from the urls the links pointed to.

What I am specifically searching for is larger trends, smaller trends would be very difficult to catch using this method.

I'm actually quite surprised how even the graphs come out over the longer term, I would have expected more variation in the submissions.

So if there is a problem at this point in time I would conclude that the problem is not in the submissions, they seem to have roughly the same subjects over the long term as they did in the beginning, with the exception of a shift of focus away from 'startups' in the first year or so of operation.

I think that has to do with an influx of programmers / people interested in technology in general whereas originally most of the people on news.yc were active in the startup scene.

they seem to have roughly the same subjects over the long term as they did in the beginning

That's been my impression for a long time. Do your techniques allow you to measure the trend of people complaining about the site deteriorating? Because that's been going on for a long time too, and in approximately the same way (though possibly in cycles).

I think that has to do with an influx of programmers / people interested in technology in general whereas originally most of the people on news.yc were active in the startup scene.

Pretty clearly that is because the site was originally named Startup News and had a relatively narrow scope, then was renamed to Hacker News as part of explicitly broadening the scope.

> Do your techniques allow you to measure the trend of people complaining about the site deteriorating?

No, especially not because plenty of those get flagged and die.

I'm pretty sure the number on the x-axis is the amount of weeks past the founding of HN.
Eventually I'll release the whole dataset.