|
|
|
|
|
by stuffoverflow
451 days ago
|
|
There definitely are tools for scraping basically any site by using the browser itself to make sure all dynamically loaded stuff gets intercepted correctly. Browsertrix[0] is probably the most well known and complete scraper for that. They offer it as a paid service for convenient setup but its open source and can be self-hosted as well. 0: https://webrecorder.net/browsertrix/ |
|
Does anyone have experience self-hosting this in the cloud? I'd worry about run-away traffic cost but since ingress is cheap most of the time maybe this is not a big problem?