Python 3's asyncio and aiohttp do the job for me -- I can crawl several times as fast as the article's 111 qps with just a single process. https://github.com/cocrawler/cocrawler
Of course, it matters what you're doing with the page content, and how you're managing your metadata, and all that.
You are right, asyncio and aiohttp is a great combination. I've used them both before. Though aiohttp cost less memory than requests, aiohttp does't support https proxy. This is the only one that isn't so perfect.
Besides , asyncio is something the same as concurrency of celery. I will have a test which one will have a better performance.
Of course, it matters what you're doing with the page content, and how you're managing your metadata, and all that.