Hacker News new | ask | show | jobs
by joshlk 2053 days ago
That would be a good idea, but it's non-trivial in Python as multi-processing async support is patchy. If you have to choose multi-processing vs async for HTTP requests, from experience, async is much faster (as your not CPU bound but IO bound) and easier to use.
1 comments

I do both when I'm scaling to millions of requests per day. I also do some cpu bound preprocessing after the io bound async functions end.
Look into Go, specifically the colly pkg. Learnt Go just so I could use this library after wrestling with highly parralelized Python web scraping. The 1k/requests/sec/core figure is true and frankly deadly, and Go borrows a lot of useful stuff from Python like array slicing.

It was pretty remarkable to go from 32 cores being maxed for 100 concurrent workers to... A 10-15% usage spike on a 4 core XPS :)

https://github.com/gocolly/colly