A lot of that has already been solved for by scaling workers to cores along with techniques like greenlets/eventlets that support concurrency without true multithreading to take better advantage of CPU capacity.
But you are still more or less limited to one CPU core per Python process. Yes, you can use that core more effectively, but you still can't scale up very effectively.
Yes, multiple worker processes is what I meant. Few web apps have a meaningful use for parallelism within a single process. So long as you’re keeping all cores busy with independent processes at high concurrency, multithreading adds relatively little.