Regarding the first point. If I understand correctly you say that there is an inevitable performance hit when not running in kernel mode. But is that really so?
I don't like that you were downvoted for curiosity.
Yes, there is a performance hit but how large it is depends a lot on what you're doing.
"Switching" is one of the most costly operations and in Kernel mode you do not need to do it unless interacting with something in user space.. which you would only do because something in Userspace requested it somehow.
For other things, such as virtual memory, Microsoft found that the protections needed for virtual memory could be anywhere between 10 and 20%; but since there's no concept of virtual memory in kernel space: it's hard to say concretely that your program would be "20% faster". It would be too much of a different program.
The whole point of dpdk is that it bypasses the kernel to obtain speed. It almost certainly requires a lot of privileges, but the kernel itself isn't the source of the hit.
The cost is in the switch between driver->kernel->network-consumer as running TCP and routing packets to the correct process are done by the kernel.
If you run dpdk you get raw ethernet frames and run TCP yourself. This means that a program can receive any data sent to the machine. More sophisticate cards can do routing in hardware and present multiple "virtual" cards, but this is not yet commonly available in consumer hardware.
dpdk is not general purpose networking. It's passing a device to a process and forgetting about it; the process itself needs to decode what is sent on the wire and make sense of it.
It's basically the kernel giving up on trying to do anything with the hardware, thus it's not available to any other process except the one that takes the hardware.
To have general purpose networking in user space you will end up with some other IPC which does not rely on sockets (because sockets are kernel) or shared memory which is dangerous as hell.
Yes, there is a performance hit but how large it is depends a lot on what you're doing.
"Switching" is one of the most costly operations and in Kernel mode you do not need to do it unless interacting with something in user space.. which you would only do because something in Userspace requested it somehow.
For other things, such as virtual memory, Microsoft found that the protections needed for virtual memory could be anywhere between 10 and 20%; but since there's no concept of virtual memory in kernel space: it's hard to say concretely that your program would be "20% faster". It would be too much of a different program.