Hacker News new | ask | show | jobs
by 1vuio0pswjnm7 660 days ago
Does CC publish the methodology for how they determine what to crawl. More particularly, how do they determine what not to crawl.
1 comments

Yes a few big sites are missing, notably reddit. Most of CC is spam though, the real useful content is really small.

I'm experimenting my own search engine at the moment, and am considering to make it public at some point. It's not that impossible of a task!