Hacker News new | ask | show | jobs
by boricj 1344 days ago
> Many will cry about that performance hit (including me) because for over a decade our computers have gotten faster marginally, but our software has gotten slower and bloatier at an increasingly rapid pace.

We're talking about network stacks and network drivers, not web browsers. Migrating the network stack from the kernel to a user-land process is not going to measurably slow down web browsers, especially on modern systems with gigabytes of RAM, multiple cores, IOMMUs and whatnots.

> These are memory bugs, so the introduction of Rust into the kernel could help us here potentially, no need for an architectural revolution.

That would require rewriting the network stack and network drivers in Rust (driver code is much more likely to have bugs than the rest of the kernel) for this to be effective, otherwise you'll still have a lot of C code in the network path. I'd argue that this would be a bigger architectural revolution than porting the existing code and running it in user-land. MINIX3 went through such a change when drivers were removed from the kernel (can't find the publication about it right now) and they only required reasonably small changes when porting these to user-land, there were not rewritten from scratch.

But this is not just about memory safety, Rust code can still be vulnerable in many other ways (memory leaks, unsafe blocks, wrong assumptions, incorrect algorithm implementation, buggy/compromised toolchains...). Code running inside the trusted computing base of a system is a liability, enforcing privilege separation and principle of least authority reduces it.

2 comments

> We're talking about network stacks and network drivers, not web browsers.

Ah yes, the magic web-browser that doesn't do any kind of networking at all.

> Migrating the network stack from the kernel to a user-land process is not going to measurably slow down web browsers, especially on modern systems with IOMMUs and whatnots.

I don't know how you can possibly assert that, it's contradicting computer sciences' current understanding of operating system design as it relates to kernelmode/usermode switching, unless you're doing weird shared-memory things in userspace... which is terrifying.

> That would require rewriting the network stack and network drivers in Rust

Not really, C and Rust can interop just fine, you can have network drivers that are rust but the actual networking stack itself can remain C, if you want.

> but this is not just about memory safety, Rust code can still be vulnerable in many other ways

The post is literally memory safety bugs.

> Ah yes, the magic web-browser that doesn't do any kind of networking at all.

The web browser isn't Netflix trying to serve hundreds of gigabits per second of encrypted video streams from a single server. Do you really need the ability to reliably saturate a 40 Gb/s Ethernet link to browse Hacker News comfortably? You'll hit various other bottlenecks long before performance for practical usages of web browsers will be significantly impacted by a user-land network stack.

As I've said, there are use-cases where extreme throughput and latency requirements warrant a design focusing on performance. Smartphones aren't one of them.

> I don't know how you can possibly assert that, it's contradicting computer sciences' current understanding of operating system design as it relates to kernelmode/usermode switching, unless you're doing weird shared-memory things in userspace... which is terrifying.

Again, not everyone is Netflix. I'd rather have a computer capped at 1 Gb/s speed with a user-land network stack than a computer capable of saturating a 40 Gb/s Ethernet link with a kernel network stack when I'm managing my bank accounts. Most end-users don't need ludicrously fast network speeds to browse funny cat GIFs on their web browsers.

Also, I've contributed code to multiple operating systems (MINIX3, SerenityOS). Running an user-land network stack isn't going to turn your 1 Gb/s Ethernet card into a 10 Mb/s Ethernet card.

> Not really, C and Rust can interop just fine, you can have network drivers that are rust but the actual networking stack itself can remain C, if you want.

As far as I can tell, the bug is in the network stack itself. A network driver written in Rust wouldn't immunize your Linux kernel here from this bug.

> The post is literally memory safety bugs.

The consequence is about computer security, of which memory safety bugs are but one cause among many.

> The web browser isn't Netflix trying to serve hundreds of gigabits per second of encrypted video streams from a single server.

Ironically, server workloads are the ones that are increasingly moving to networking stacks that run in user space, using frameworks like DPDK, with performance as a motivator: https://en.wikipedia.org/wiki/Data_Plane_Development_Kit

Of course, there are some caveats - from my understanding, typical DPDK use cases would turn over the entire NIC to a single application, meaning you aren't contending with sharing the network between multiple, potentially adversarial user mode processes. This is fine for a server, but not really appropriate for a PC or smartphone.

Yes, the way Netflix and Co. are using Userspace drivers is by passing entire devices to a single application.

There's no general purpose IPC happening there.

Netflix interestingly (rather than focusing on DPDK/user-space techniques) seems focused on increasing the throughput of kTLS on their CDN appliance boxes so they can simply sendfile(2) right out of VFS cache in kernel space for the bulk of the data plane. An alternative pathway to the same goal of increasing throughput by colocating your general data and your network stack state in the same context.
I wonder if io_uring will be able to maintain competitive against DPDK-like approaches. Multi tenant solutions are more attractive and seem like they could be extremely competitive since they should be largely equivalent in the case that you have a single tenant.
> unless you're doing weird shared-memory things in userspace

Shared-memory things in userspace, i.e. buffers shared between 2 distinct user processes are no weirder than buffers that are shared between a user process and a kernel-mode driver. In both cases the buffers cannot be accessed by third parties.

Moreover, the transfer of data between 2 processes through a shared buffer can be done without any context switch (which could be slow), if the 2 processes are executed on distinct cores. Therefore having the network device driver as a distinct process does not have to cause any reduction in performance, if the means for inter-process communication are chosen wisely.

For any device driver that is implemented as a user process, the kernel can enable direct access to any I/O ports and memory-mapped I/O areas that are needed by the device, so the device driver can work in user mode without requiring any context switches.

Such direct I/O access cannot be enabled for ordinary processes, because those are not trusted enough and also because the direct I/O access could be enabled only for a single process at a time.

A dedicated device driver process solves both the trust problem and the multiple access problem equally well as a kernel-mode driver.

Things are more complicated. You can indeed have a very fast network driver in userspace (in fact for many use cases userspace networking is faster than the kernel). But where do you put the rest of the network stack?
> Ah yes, the magic web-browser that doesn't do any kind of networking at all.

They clearly didn't claim that. Your webbrowser being slow nowadays is not because it needs to do some networking.

They are claiming a loss in performance is ok.

I am claiming that people keep making this claim and it no longer holds true because software is already losing too much performance for the value we get back.

That's my whole thesis.

Your claim assumes that a small loss in performance in networking will lead to a loss in performance of the overall web browser, which is only true if networking is the bottleneck while browsing. And it usually isn't.
Ah, so you think the only thing I do with a computer is use the browser? That's weird, I was just making an example of something that is so slow that is literally unworkable in the modern day already.

Impacting networking affects the entire machine, especially in so far as a computer is increasingly just a dumb terminal to something else.

Look, If you make network requests potentially 20% slower then the browser performance will be impacted too, it's so obvious that I'm not sure how I can explain it simpler.

By how much? I am not sure, but you can't say it won't be slower at all unless we're talking about magic.

Pretending that it's trivial amounts of performance drop without evidence is the wrong approach. Show me how you can have similar performance with 20% increase in latency and I will change my stance here.

As it stands there are two things I know to be true:

Browsers rely on networking (as do many things, btw) and software is increasingly slow to provide similar value these days.

The point is that most users and use-cases of networking don't have high requirements on bandwidth or latency that warrant a network stack design focused on high performance. Let the ones who want to live on the edge do so if they want, but don't force your high performance, one-bug-away-from-total-disaster network stack design based on your own (probably overblown) requirements on everyone else.

Grandma doesn't care if her tablet can't saturate a WiFi 6 link. Grandma doesn't care if her bank's web page takes an extra 75µs to traverse the user-land network stack. But she will care a whole lot if her savings are emptied while managing her bank account through her tablet. Even worse if her only fault was having her tablet powered on when the smart toaster of a neighbor compromised it because of a remotely exploitable vulnerability in her tablet's WiFi stack.

Or are you suggesting that grandma should've known better than to let her tablet outside of a Faraday cage?

> Pretending that it's trivial amounts of performance drop without evidence is the wrong approach.

Amdahl's law begs to differ. If it takes 5s for the web site to arrive from the bank's server, spending 5µs or 500µs in the network stack is completely irrelevant to grandma. Upgrading her cable internet to fiber to cut these 5s down to 500ms will have much more positive impact to her user experience than optimizing the crap out of her tablet's network stack from 5µs down to 1µs.

> I'd argue that this would be a bigger architectural revolution than porting the existing code and running it in user-land. MINIX3 went through such a change

MINIX is a microkernel architecture - running drivers in userspace is one of its core features/selling points, and one that differentiates it from (modular) monolithic kernels such as Linux. So, this isn't a very solid line of reasoning.

It seems to me that the situation is the opposite - that moving drivers to userspace is an architectural change, which is more complex than porting an existing architecture to a new language.

> Rust code can still be vulnerable in many other ways

Sure, but not vulnerable in the way that the vulnerability under discussion is.

> memory leaks

Much harder in Rust than C, and also unlike in C, not going to result in security vulnerabilities.

> buggy/compromised toolchains

If you're going to assume that your toolchain is compromised, than anything is on the table, including the toolchain inserting a backdoor into the kernel and completely bypassing the proposed architectural change of moving drivers into user-space. And, needless to say, compiler bugs are rare in general, and compiler bugs that cause software vulnerabilities are nearly unheard of (and I've literally never seen one before).

Nobody thinking rationally is going to tell you that Rust is going to eliminate all your bugs or make your code secure. However, by far, the majority of security bugs in the Linux kernel are due to mistakes that the design of Rust either completely eliminates or massively reduces.

And security is intrinsically a tradeoff - the Linux kernel is not optimized for maximum security (which would be something formally-verified like seL4), but a compromise between security, performance, and development velocity. The claim is that Rust will provide significantly better security at basically the same performance and possibly modestly improved development velocity - the very least that one should do is rewrite the existing architecture in it (or, again, a language that meets or exceeds the specs of Rust) and then see what the bug rate is before deciding to take a guaranteed performance hit through an architectural change.

> MINIX is a microkernel architecture - running drivers in userspace is one of its core features/selling points, and one that differentiates it from (modular) monolithic kernels such as Linux. So, this isn't a very solid line of reasoning.

The first two versions of MINIX ran drivers inside the context of the kernel. The migration of drivers to user-space and overall emphasis on reliability didn't happen before MINIX3 in 2006.

Given that Linux nowadays has FUSE and UIO, still calling it a strictly monolithic kernel is probably a bit of a misnomer at this point. The same goes for Windows, NetBSD and others by the way.

> It seems to me that the situation is the opposite - that moving drivers to userspace is an architectural change, which is more complex than porting an existing architecture to a new language.

Trying to do a straightforward port of a C code base to Rust will quickly grind to a halt due to Rust's borrow checker. On a highly-optimized code base such as the Linux network stack, untangling every last optimization trick and shortcut to make the Rust compiler happy would require a large-scale refactoring that'll end up looking nothing like the original code base, at which point you might as well rewrite it from scratch.

In comparison, migrating that C code base to user-land would be less of a disruptive change to the code base (as was the case when MINIX3 did so with its drivers). It's still the same old network stack, adapted to run on a different environment.

> Sure, but not vulnerable in the way that the vulnerability under discussion is.

Rust isn't a silver bullet against every class of bug. Code that runs inside the trusted computing base means that it's one security bug away from system compromise. Writing that code in Rust doesn't change that.