It's not just I/O, it's also mutexes, condition variables, time, etc. It's not horrible, but it does add up, so calling it super efficient is a stretch.
A modern allocator with per-thread cache can satisfy some allocations in 20-30 cycles - dynamic dispatch can easily double that, even if the target is still in cache.
It's one of these things where it's extremely use case dependant - like many performance issues, you probably don't care about it - but when you do it matters.
Inderect call cost is a few cycles, if predicted. Now, you can argue, that it may be mispredicted and misprediction would cost about 20-30 cycles. But if it is mispredicted, then you are not calling into allocator often enough. And if you don't hammer it hard, then why do you care about preformance?
I believe their plan is using "restricted function pointers", where you can specify that a pointer will only ever be to a function defined in the codebase. I'm pretty sure they also have plans for devirtualization, but I haven't followed super closely.
Oh, is it not a specific keyword? I thought they were thinking of it being a keyword so you could be sure that it was restricted, in case a variable or function was exported that took in a foreign pointer.