If you're going to argue this then I agree with tptacek, you must also think it's an architectural problem that filesystems cannot all be implemented over FUSE with performance equivalent to their kernel versions.
Yes, TUN is not 0-copy. Is there an equivalent that's as fast as the kernel in any other major OS? (I'm asking that honestly, I have no idea.) If there is, is in-kernel networking on that OS as fast as on Linux?
The Linux network stack is what it is, there have been recent improvements for it (e.g. DPDK), but I think it must not be so bad given it powers most of the Internet.
It doesn't have to be "over FUSE". There are microkernels out there like QNX and L4 which solve the filesystem in userspace problem, are extremely fast, and have done this for decades. You likely have a realtime L4 instance in your phone doing the real work talking to the phone network. Heck even my Nintendo Switch has a simple microkernel OS, with an unspectacular ARM processor, and it runs games - one of the most performance sensitive applications that regular people use.
Right, but L4 isn't even an operating system; it's an OS fabric, on which OS personalities are built. Nobody doubts you can build a microkernel architecture where filesystems and VPNs are equivalently performant in userland and the kernel, but nobody has managed to do that for a mainstream OS.
It's telling that xnu started out on Mach with a FreeBSD "personality", and grew into another shaggy monolith.
The L4 in your baseband or your enclave is not running anything resembling a general purpose operating system.
So I guess my argument is: sure, you could design and build VPNOS, the OS where VPNs are fast and never need to touch the kernel. But nobody wants that OS.
I won't argue against you here, I like microkernel desing à la QNX, especially with synchronous message passing (MsgSend), and I used to run Minix3 as my main OS some years ago. It relies on a completely different scheduler model though, with different trade-offs. And you can't always avoid copies.
Also, QNX has never seen usage as a general purpose OS on a PC, even though I remember trying a QNX Photon Live CD years ago. I wonder what would work well, and what wouldn't. In particular, are there security issues related to the use of message passing for drivers...
Also, QNX has never seen usage as a general purpose OS on a PC.
It has. QNX 6.2.1 offered a full desktop environment. I ran it as my primary OS for three years (2003-2005) while working on a DARPA Grand Challenge vehicle. The vehicle itself ran QNX, and development was also on QNX. An early Firefox and Thunderbird both ran well. The Eclipse IDE ran. All the command-line GCC tools ran.
It worked like a typical UNIX/Linux system, but with more consistent response. No swapping. I could run the real-time vehicle code while compiling or web browsing and the real-time code. That consistency in response time made QNX a nice desktop OS.
It disappeared on the desktop after Blackberry took it over and made QNX closed source again. (For several years, all the source was online. Then one day Blackberry took it down, with no warning.) All the open source projects then stopped supporting QNX.
QNX development is now cross-compiled from Windows.
With a small microkernel with a good track record, there's no churn. There's no new kernel every week. The QNX kernel had an update once a year or so. This is a big win when it controls your nuclear reactor.
I mean, there's no churn in "the thing called your kernel", but that doesn't mean there isn't churn in the TCB, which is what you care about in security design. There's no reason the TCB in a microkernel would necessarily be any smaller than that of a monolith.
Well, in case of WireGuard if used as user space app, you would need to copy every invalid (by WG definition) packet from kernel space to user space to discard it instead of dropping it in kernel space. You would need to copy a lot of packets, that do not need to be copied to user space, instead of just processing them in kernel space.