In the titular post there's a link to a previous comparison between approaches, and plain OS threads used from Rust fare quite well, even if the author doesn't up the OS limits to keep that in the running for the higher thread cases: https://pkolaczk.github.io/memory-consumption-of-async/