|
|
|
|
|
by 8chanAnon
797 days ago
|
|
I have a question for you. Why use Python at all? Or any of these puppeteer-type apps? If you're trying to scrape web sites then would it not make sense to do the scraping from inside of the web browser rather than from outside using these clunky and complicated methods? I mean, write the code in HTML/Javascript and load the page in any browser (Selenium and Puppeteer require Chrome). A local proxy server would, of course, still be vital even if only to bypass the CORS restrictions (though a good proxy needs to do much more than that). The bonus of running Javascript in the browser is the ability to use exec (important tool for cracking code obfuscation) and, of course, to parse JSON and HTML. Also, you can solve the Cloudflare bot challenge (or any bot challenge) by running the challenge in a new tab and stealing the cookies via the proxy server. Just asking because I'm already doing this. I mentioned this months ago but I'm still preparing the tools for official release. |
|