|
|
|
|
|
by photochemsyn
854 days ago
|
|
I've found this approach works really well using JavaScript and puppeteer for the first stage, and then Python for the second stage (the re module for regular expressions is nice here IMO). JS/puppeter seems a bit easier for things like rotating user agents, from article: > "Websites often block scrapers via blocked IP ranges or blocking characteristic bot activity through heuristics. Solutions: Slow down requests, properly mimic browsers, rotate user agents and proxies." |
|