| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mnmkng 1438 days ago

The ideal approach would depend on your architecture. It's really easy and cheap to create new queues on the Apify platform (we create ~500k every day) so we usually run a crawler per domain. It performs the best and it's the easiest to set up.

On Crawlee level, you can open new queues with one line of code and name them with the hostname, so the most straightforward solution would be to run multiple Crawler instances with multiple queues and then rate limit using the options explained here https://crawlee.dev/docs/guides/scaling-crawlers and push the new URLs to the respective queues using the URLs' hostname.

If you'd like to discuss this a bit more in depth, you can join our Discord or ask in GitHub discussions. Both are linked from Crawlee homepage.