Hacker News new | ask | show | jobs
by KaiserPro 3140 days ago
I think netflix use BSD because they wanted to use BSD. Sure some flavours of BSD have ZFS built in, but thats a pretty rare corner case.

Linux has two things that are extremely useful compared to BSD:

1) commercial backing (should one choose it) 2) first class support for inifinband, top end ethernet (should they use it) and storage controllers

4 comments

Netflix uses BSD for OpenConnect because asynchronous disk I/O-which is critical for a CDN-remains a tire fire on Linux after more than 20 years.

On Linux you basically have to use blocking threads to emulate async disk I/O, which means tons of threads and overhead when you’re handling 10k-100k concurrent connections per box.

This is incorrect. Linux has had proper direct async disk I/O for a decade or more, used ubiquitously in database engines (among other things). It is not emulated with threads.
Last I looked (~4.4) linux AIO implied DIO. Conflating aio and dio is the problem, not a feature. On FreeBSD AIO works with the page cache for read and write, read ahead works, sendfile works, io & cache & readahead & size hints all work. Linux has half of those, and DIO none. As i recall.
Bingo. Async disk IO on Linux has to be unbuffered and block aligned. Making it useful only for databases that manage their own caching, and useless for file systems.

A video CDN needs lots of concurrent access to a file system.

That's not really true anymore. Linux AIO works fine on XFS.

http://man7.org/linux/man-pages/man2/io_submit.2.html

Isn’t that still experimental?

From the current aio man page:

“Work has been in progress for some time on a kernel state-machine-based implementation of asynchronous I/O (see io_submit(2), io_setup(2), io_cancel(2), io_destroy(2), io_getevents(2)), but this implementation hasn't yet matured to the point where the POSIX AIO implementation can be completely reimplemented using the kernel system calls.”

Not true. I've regularly demonstrated very high throughput/connectivity with lots of little connections. The problem I have seen (not only with linux) has been over-aggressive congestion controls, usually configured/set wrong.

On high performance async IO, this works quite well in Linux, and there are no blocking threads that I am aware of in that stack. The kernel uses bio dispatches to perform the actual block io. If you are complaining about using bio to perform the actual IO, and that linux includes this in its load calculation, sure, that is a conscious decision as I understand it on the part of the block layer folks. Is it wrong or bad? I don't think so, though others have different opinions.

FWIW ... I work at a place now using SmartOS as its primary OS. There are many people I know preferring BSD. Many people preferring linux. I have a different view, one that is not as popular as I hope it would be.

Specifically, I look at operating systems now, largely, as an implementation detail for your stack. You have a mission in many cases, unless you are an OS developer, that consumes the OS services layers to help you perform your mission. In many cases, specifics of the OS don't matter, as long as they don't get in your way. Sometimes the specifics of the OS help you.

From my view as an HPC guy, a hardware guy, a storage/compute/ML/GPU guy, I generally can work in Linux and BSD without pain. Minor config difference, but I am comfortable in both.

I am not, and have not been comfortable in AIX, HP/UX, and UnicOS. I used to enjoy IRIX until I started playing with Linux. I used Solaris and SunOS in the past, and SmartOS/illumos today.

As long as the OS has the tools I need, the libraries I need, or a way for me to build them, and doesn't constrain me or force me to contort to vagaries of the OS itself, I am fine with it.

A problem arises when people get caught up in "my OS > your OS", which, this overall question at least brings in under the covers. This usually comes around from various esoteric aspects of little relevance for the vast unwashed masses of users (like me). On the OS dev side, when this happens, it is usually defensive because something needed is missing, or some OS dev/manager (mis)believes that users don't actually need the features they are requesting.

That is actually a major problem, and it tends to drive people from your platform. Users aren't dumb, and there are many sophisticated people who have a deeper appreciation for the issues, than "my OS > your OS".

Why *BSD isn't used might be for historical reasons, momentum, etc. It is perfectly fine as an OS, and quite usable for HPC. Similar for illumos/SmartOS (not simply saying that as I work for a company using SmartOS). There are missing things in both of these, and I am working (on the side) to try to help SmartOS get some of these things (user space stuff). FreeBSD in particular has most of what is needed.

Basically pick the system that works for you and your users. The OS, as I noted, can be viewed as a detail of the implementation. Or not.

But its not a reason to create friction/tension between groups claiming OS1 > OS2 ...

The VI/Emacs wars are so 80s/90s ...

I think the issues tatersolid has with linux aio is implicit dio. Thats really painful if youre working with hdd or high concurrent read scenarios. See my sibling comment for why.

That leads to people implementing “async io” threadpools in userland. Those threads then do “regular” blocking io which is able to use the page cache etc. having hundreds or thousands of blocking IO threads then causes lots of other perf/scheduling issues.

It’s not just “my OS > your OS”: SmartOS is bulletproof when it comes to correctness of operation, data integrity and superior ease of system administration, which most prominently manifests itself in less breakage, non-existant problems caused on Linux by techology concepts from the ‘80’s of the past century and nights slept through instead of being in conference calls with clueless managers screaming at one at 01:13 in the morning. These were all issues I have and have had with Linux which I don’t have with SmartOS. That’s a big difference!

An OS is a priori better if I get to sleep through the night without an incident.

FreeBSD has commercial support from iXsystems.

FreeBSD (and perhaps some other BSDs) support “top-end” Ethernet as well. There was a great post on the Netflix blog a couple of months ago (discussed on HN) about how Netflix optimized their systems to serve video at 100Gbps.

FreeBSD does not support Infiniband, afaik.

https://medium.com/netflix-techblog/serving-100-gbps-from-an... Is the article. I'd not actually seen that.
FreeBSD has supported Infiniband for over a decade.
Most modern HPC clusters use Infiniband and the more exotic Ethernet types - having done courses on classic structured Ethernet setups seeing some of the challenges building HPC clusters are fascinating.
Netflix uses bsd because it has great IO handling.

https://www.quora.com/Why-did-Netflix-choose-FreeBSD-over-Li...

But all the supercomputers use their own custom linux. So no commerical backing. Also these computers are not your standard data center. They cut networking and storage to a minimal because those are bottlenecks. These things are just massive ram/cpu/gpu boxes connected properly through pci.

Edit: I was looking at Sunway hardware specs the number one supercomputer they use a PCI-E 3.0 connection for all there nodes. Communication between the nodes is 12GB/second with a latency of 1 us. Their total ram is 1.31 PB

But all the supercomputers use their own custom linux. So no commerical backing.

This is just wrong. Yes, they use custom Linux, but it is highly highly supported. You buy a Cray or a BlueGene and you get dedicated kernel engineers as well as on site support etc etc.

They cut networking and storage to a minimal because those are bottlenecks. These things are just massive ram/cpu/gpu boxes connected properly through pci.

This is just wrong. Networking is extremely important in supercomputers - but it isn't like setting up a LAN. They use custom networking, Infiniband, Aries, OmniPath etc. There isn't much information about the "PCIe Network" on the Sunway, but the fact it is PCIe isn't very interesting - everyone has fast optical networking. It's the topology and protocol which makes things interesting.

I don't consider it commercial linux because they are not competing with other options. The companies that do build these supercomputers have to provide technical support because nothing out there exist for it. Just a different view of what commercial linux is vs building hardware specific software.
Its very much commercial linux, because you are paying for a service, that's linux based.

Sure with how cheap inifiband is (especially compared to 40/100 gig ethernet) one _could_ cobble together a system your self.

Where the magic sauce comes in, and where the like of cray really make things shine is the software they provide to allow end users _easily_ do multi-machine scaling.

libraries for just in time delivery of data directly into ram? yup. location aware job dispatchers that co-locate jobs near each other logically? yup.

All of those hard things are solved for you.

Redhat is a commercial linux because they are competing with other os/distro in this market. If I pay Joe $5 a month to keep my ubuntu up to date it doesn't make ubuntu a commercial linux even though I am paying for a linux service. These companies building supercomputers are competing in producing supercomputers. Not in providing a linux disto and providing a service for said linux. I very much doubt I could get access to their linux disto and linux service without first purchasing a supercomputer from them.
This is pretty much exactly how every HPC OS has been sold since the Cray X-MP. It's like if you buy Isilon - it is a software, hardware and support you buy. No one argues that isn't commercial.
The Quora discussion doesn't seem to add much. Just a guy saying 'FreeBSD is rock-solid' and '[for] raw performance, .. nothing beats FreeBSD', without giving any technical details.