The async version also doesn't spawn as much Actors/Threads/whatever than all other implementations. It only runs the Task.WhenAll continuation in the ThreadPool - and probably even that not because it can recognize that all child tasks are ready and execute the continuation synchronously. The leaf tasks, whose amount is the biggest, are only synchronous function calls that return a fulfilled task instead of spawning anything. A fair comparison should use at least use something like `Task.Run()`, which would make it a lot slower. And then you can make it a lot faster again by using a custom TaskScheduler.
I think there are by far too many knobs to turn to make this comparison meaningful.
That was a quick update! Looks better now. I would have expected the difference to be much bigger. I made some similar comparison about a year ago (with normal .NET), and it performed slower than Go. However with a custom Actor implementation (similar to the Scalaz implementation) I was a lot faster. F# with the Hopac library also performed very good.
I think there are by far too many knobs to turn to make this comparison meaningful.