Hacker News new | ask | show | jobs
by Xeroday 4700 days ago
Very cool project. I'm curious as to how you're getting all these robots - are you scraping them yourself?
1 comments

Downloading them as we speak. I have a big list of hosts/domains i have collected through spidering for my DNSDigger.com. This is a hobby project that has grown a bit over my head hehe. And there is no scraping needed. Robots.txt is just simple textfiles. Download and parse, repeat a couple of million times and build an index :)