Hacker News new | ask | show | jobs
by anon1m0us 2497 days ago
Anyone with me on a startup that only indexes the part of the internet Google doesn't touch?

  No Google Adwords enabled sites
  No Google Analytics enabled sites
  No Google CDN referencing sites
  etc...
I don't want anything to do with them anymore at all. Ever.

Or anyone who has anything to do with them.

Can we take our internet back?

1 comments

I was thinking about this. I'm lazy and not very talented, so is there any decent code existing and open sourced that could be adapted to spider only sites without advertising?
Just crawl, and then disregard sites that have certain scripts present.
Yeah, technically, it'd be very easy. I've written countless spiders, crawlers, etc.

Basically, only index items where webpage.indexOf('google') < 0