|
|
|
|
|
by laughfactory
3251 days ago
|
|
I do it using rotating proxies, stripping cookies between requests, randomly varying the delay between requests, randomly selecting a valid user-agent string, etc. It's a pain in the butt. And to scrape more than I do, faster than I do, would be pretty freaking expensive in terms of time and money. Note that Google is pretty aggressive about captcha-ing "suspicious" activity and/or throttling responses to suspicious requests. You can easily trigger a captcha with your own manual searching. Just search for something, go to page 10, and repeat maybe 5-20 times and you'll see a captcha challenge. If Google gets more serious about blocking me then I'll use ML to overcoming their ML (which should be doable because they're always worried about keeping Search consumer-friendly). |
|