|
|
|
|
|
by chrismarlow9
3930 days ago
|
|
Assuming you're not bound by rate limiting on the remote hosts and the average page crawled is < 1 megabyte, and you're running on something comparable to a medium EC2 instance, yeah I would say that is fairly slow. I've written more web crawlers than I can count in php, python, scala, golang, nodejs, and perl. Right now assuming you want to just gather some form of JSON/HTML from the response, I would use golang and gokogiri with XPaths (and of course json unmarshal for json). It will make you laugh at 120k per day. Feel free to ping me if you would like to discuss making me one of those freelancers. |
|