Hacker News new | ask | show | jobs
by perl4ever 2497 days ago
I was thinking about this. I'm lazy and not very talented, so is there any decent code existing and open sourced that could be adapted to spider only sites without advertising?
1 comments

Just crawl, and then disregard sites that have certain scripts present.
Yeah, technically, it'd be very easy. I've written countless spiders, crawlers, etc.

Basically, only index items where webpage.indexOf('google') < 0