Hacker News new | ask | show | jobs
by MichaelZuo 854 days ago
I used to believe this but I don't think so any longer after enough time on the internet.

There's probably not more than 50k meaningfully unique sites with some notable amount of actual desirable information, after excluding all the SEO'ed sites, blogs repeating each other, etc... at least for the English web.

Manual curation is entirely possible since probably there aren't even 50 such sites being created per day on average. This is including every single forum still open to public viewing. There really aren't that many left (<10k).

2 comments

How long do you expect that will remain the case in the face of such a flood of zero-incremental-cost garbage as we here discuss?

Especially worth mentioning in this connection is https://news.ycombinator.com/item?id=39424688, as of this writing #1 on HN. I mention it here because what it says about moderation, and about centralized platforms being both the highest-value and most poorly managed targets, applies here also.

Forever, if they also have access to comparable tools to weed out lower quality sites.

Why would you expect otherwise, that intelligent people will suddenly lose their ability to perceive what's higher quality content?

There are a lot more sites than that when you throw in personal blogs.

The issue is that those are now impossible to find.

There aren't, if you exclude all the spam blogs, and include only the ones that are fully accessible without a paywall and have received an update in the last year.

A huge proportion have simply stopped updating, gone offline or moved to a paywall on substack/medium/etc...

The 50k number is all inclusive and probably even still an overestimate.

How did you derive that figure to begin with? And in what realm does only what's been posted in the last year qualify as information worth retaining the ability to retrieve?
It's an estimate based on my own experience? I'm not really sure what your specifically asking for.

And I never mentioned whether something 'qualify as information worth retaining the ability to retrieve'... are you confused about what the comment chain is about?