Hacker News new | ask | show | jobs
by wumpus 3402 days ago
Python 3's asyncio and aiohttp do the job for me -- I can crawl several times as fast as the article's 111 qps with just a single process. https://github.com/cocrawler/cocrawler

Of course, it matters what you're doing with the page content, and how you're managing your metadata, and all that.

1 comments

Thanks for your comment.

You are right, asyncio and aiohttp is a great combination. I've used them both before. Though aiohttp cost less memory than requests, aiohttp does't support https proxy. This is the only one that isn't so perfect.

Besides , asyncio is something the same as concurrency of celery. I will have a test which one will have a better performance.

Thanks for your comment again.