It buys you the stack size of each thread which only matters if you have a stupid amount of connections.
In this article[1] the author makes a comparison between the 2 models and 7000 concurrent users will chew up 450MB of stack space. Of course this is adjustable.
On most Linux systems stack is allocated with mmap with overcommiting. Until first write all those pages will share same zeroed page AFAIK. Then only overwritten pages will be allocated.
I think the idea is that these "coroutine objects" (or the equivalent structure in whatever language) is smaller than the typical stack size for a thread. For example, the default stack size on Windows is 1 MB. So if you have a thread per connection, obviously this is going to take up a decent amount of memory. I'm guessing the answer to this is a thread pool so your memory usage doesn't blow up.
Am I wrong?