Is that worth the complexity? I like the way Go does it. The runtime is managing the actual threading and async IO as it schedules go routines, but to the programmer it all looks like normal synchronous code.
thread per request doesn't imply kernel threads. And I'm pretty sure even a kernel thread per request in a faster language is going to be better than asincio in python.