|
|
|
|
|
by toomuchtodo
4582 days ago
|
|
You just ignore the robots.txt file, crawl slowly, and from distributed virtual machines. Not that you should do that. Robots.txt is a nicety though, the client doesn't have to respect it, and the server doesn't have to allow your HTTP requests. |
|