Hacker News new | ask | show | jobs
by vivzkestrel 58 days ago
- i saw your other comment that talks about using an open source dataset but i had to ask

- how would you actually go about loading reviews if you really wanted to

- what kind of system would you need to work around the captcha and stuff

1 comments

i would probably use Playwright with custom code, create chunks based on similar products, then run it on a large cluster in parallel (https://github.com/Burla-Cloud/burla).

if you have a single worker trying to scrape a shit ton of products back to back to back you're going to get rate limited or their bot detection will catch you.