Hacker News new | ask | show | jobs
by zepolen 3402 days ago
Could also use multiprocessing, got about ~500req/s returning a 'hello world' response (which the article also does). The article does about 300req/s but that's because he saturates his pipe. The reality is the article might be faster than 1,000,000/hour.

    from multiprocessing import Pool
    from requests import get
    urls = 1000 * ['http://localhost/hello']
    def scrape(url):
        return get(url).text
    p = Pool(40)
    results = p.map(scrape, urls)
~2.2 seconds on a dual core 2.2ghz
1 comments

Thanks for your comment. If you have the same test with cloud server or some public website , perhaps it will decrease some.

I've used multiprocessing/threads/geven/asyncio before. And I will have a full test with these libraries.

Thanks again!

As I said, the benchmark is flawed since it's dependent on the network pipe. It would be a good idea to run tests locally so you get a real maximum.

There are lots of factors involved which can completely skew benchmarks, for example, if you were scraping an average 10kb response instead of 'hello world' you would automatically be limited to 100req/s on a 10mbit pipe.