when you say parallel you mean the work is distributed across multiple cores? e.g. multiple python worker processes. Or is all work running single threaded in a single process (concurrent IO, but not true parallelism)?
Why not distribute the concurrent work over multiple processes to get the most out of multiple cores? More event loops, better performance, and maybe more text preprocessing.
That would be a good idea, but it's non-trivial in Python as multi-processing async support is patchy. If you have to choose multi-processing vs async for HTTP requests, from experience, async is much faster (as your not CPU bound but IO bound) and easier to use.
Look into Go, specifically the colly pkg. Learnt Go just so I could use this library after wrestling with highly parralelized Python web scraping. The 1k/requests/sec/core figure is true and frankly deadly, and Go borrows a lot of useful stuff from Python like array slicing.
It was pretty remarkable to go from 32 cores being maxed for 100 concurrent workers to... A 10-15% usage spike on a 4 core XPS :)