At high concurrency, I'd argue you will probably end up being more cpu efficient as well. The cost of context switching effectively larger frames and getting into and out of privileged mode can get expensive.
I don't know what exactly you are referring to with high concurrency, but if you mean "high number of tasks to work on in parallel" then a ThreadPool will eliminate the problems you are describing.
It probably depends, but the cost of creating a heap allocated closure and then running it and deallocating it can be quite high against the cost of 2 context switches.