|
|
|
Ask HN: Why Is My Scraping So Slow?
|
|
3 points
by scottmas
1132 days ago
|
|
At my company, we scrape our Webflow marketing website and host it ourselves on Cloudflare to avoid their crazy enterprise plan pricing. I have a little node.js script that gets the job done but it's really slow (5 to 10 minutes). For the life of me I cannot figure out how to speed up the scraping process. For example, when I scrape it locally I can only get like a maximum of like 300kb/second no matter how much I try to parallelize requests, even though I have 200mbps of bandwidth. It's just annoying for our marketing team to have such a long delay in between publishing changes and seeing it deployed live. Am I getting hit with some sort of Cloudfront rate limiting by IP address? Is there some socket limit at a real low level I'm hitting on both my local mac and the linux box I do the scraping on? What are the best ways I can speed things up? |
|
It may also be that webflow rate limits bot traffic? Try spoofing the user agent with a popular browser's[1].
But why scrape? Webflow allows to export the code[2]. But it may still require premium subscription, I haven't looked thoroughly.
[1] https://techblog.willshouse.com/2012/01/03/most-common-user-...
[2] https://university.webflow.com/lesson/code-export