Hacker News new | ask | show | jobs
by HippoBaro 974 days ago
I can't entirely agree with this, but I like how the authors think about the issue (S3 being a de-facto standard without anything formal to back it up).

POSIX is outdated and problematic. It's outdated because it was conceived when hardware was vastly different than today. Storage used to be orders of magnitude slower than compute. Today, it's the opposite. POSIX APIs make squeezing all the juice out of the underlying hardware impossible. A standard duplex 100Gbps network link will carry ~300M small IP frames per second. You'll be lucky if you can do more than a few 10k/s using POSIX APIs. That's a massive bottleneck.

And it's not something that can be fixed with clever implementations. None of the POSIX APIs are async. So, to drive concurrency, programmers have to resort to threads that don't scale. That's a fundamental issue that no hardware improvement and/or software trickery will ever fix.

Today's reality is that software is seldom written against POSIX but against Linux, which offers many more APIs. Linux is, likewise, not formally standardized. It dodges the problem of S3 because it's open-source and ubiquitous. But that's not a good solution: it stifles innovation. For that reason, there hasn't been any new (production-ready, serious) kernel in decades.

We are at a deadlock, and POSIX is part of the problem.

1 comments

That's funny. I was listening to this video (https://www.youtube.com/watch?v=bzkRVzciAZg&t=1s) Node.js is badass rockstar tech just barely, and one of the exact quotes is "Threads don't scale". I assumed it was just a joke, but here was see Poe's law in full effect.
I think there are a few misunderstandings there (the video is funny, btw). The kind of async IO I'm referring to exists one layer below whatever Apache and node are using. Both are event-driven, actually. They both use some flavor of epoll (see https://httpd.apache.org/docs/2.4/mod/event.html).

Historically, buffered IO was sufficient to circumvent slow threads. Simply because buffered IO operations usually don't block (you merely memcpy, and the kernel flushes asynchronously in the background). That can only take you so far, however, and it's apparent today that hardware can go much further and that the gap is widening.

It's a valid point, though, to question whether that additional performance is even needed. John Ousterhout (https://www.youtube.com/watch?v=o2HBHckrdQc) is currently working on a new network protocol to alleviate some of the problems TCP creates in software wrt performance, and he also questions whether there is even a need for very fast networking for real-world applications.

IMO, the mere existence of stuff like DPDK is proof enough. Many folks use it and would rather use the kernel if it could provide comparable performance.