The x axis is the rank number of the posting divided by 1000, so that's a constant sampling interval in blocks of 1,000 but more compressed in time towards the right because of the higher posting frequency.
It wouldn't make much difference actually, apart from greatly complicating the matching up of the Y axis.
The bigger issue is the fact that this is just everything that is posted and not flagged, so it is if you wish a view of the 'new' page, it has nothing to do with the 'home' page, I'll try to address that tomorrow.
As for the labelling and clustering, that was based on keywords in the title from a fair sized sample, and from the urls the links pointed to.
What I am specifically searching for is larger trends, smaller trends would be very difficult to catch using this method.
I'm actually quite surprised how even the graphs come out over the longer term, I would have expected more variation in the submissions.
So if there is a problem at this point in time I would conclude that the problem is not in the submissions, they seem to have roughly the same subjects over the long term as they did in the beginning, with the exception of a shift of focus away from 'startups' in the first year or so of operation.
I think that has to do with an influx of programmers / people interested in technology in general whereas originally most of the people on news.yc were active in the startup scene.
they seem to have roughly the same subjects over the long term as they did in the beginning
That's been my impression for a long time. Do your techniques allow you to measure the trend of people complaining about the site deteriorating? Because that's been going on for a long time too, and in approximately the same way (though possibly in cycles).
I think that has to do with an influx of programmers / people interested in technology in general whereas originally most of the people on news.yc were active in the startup scene.
Pretty clearly that is because the site was originally named Startup News and had a relatively narrow scope, then was renamed to Hacker News as part of explicitly broadening the scope.