|
|
|
|
|
by robbs
1199 days ago
|
|
IMO, this is the hardest part of maintaining a web scraper. We had ~100 scripts to scrape ~1000 clients' sites and it was, at minimum, 50 hours a week to keep up with changes. The second hardest part was 30% of our clients all used the same hosting provider, which would start to fail at 10-20 req/s. We had to throttle the sites by IP, cluster-wide. |
|