Y
Hacker News
new
|
ask
|
show
|
jobs
by
perl4ever
2497 days ago
I was thinking about this. I'm lazy and not very talented, so is there any decent code existing and open sourced that could be adapted to spider only sites without advertising?
1 comments
ErikAugust
2497 days ago
Just crawl, and then disregard sites that have certain scripts present.
link
anon1m0us
2496 days ago
Yeah, technically, it'd be very easy. I've written countless spiders, crawlers, etc.
Basically, only index items where webpage.indexOf('google') < 0
link