|
> How does this manage to scale? "Okay" to "very poorly", depending on your view. And for the record, many times the drivers go untested, frankly. I've absolutely found completely broken drivers in the kernel (as in just obvious lock imbalances where unlock() can't possibly ever get called, null pointer derefs; simple things you can spot in a code review) in the kernel where the break was introduced as some part of a refactoring and the person doing the work made a mistake. Because they also refactored 30 other drivers in the same patch series to match an API change. (The filesystem developers have said many times in patch reviews that certain things are massive chores because they have to go fix 40+ filesystem drivers every time some API break happens.) Linux actually has tons of hardware regressions because of this whole design choice. When you totally rewrite a piece of code that is shared among multiple components, that can be OK, as long as you preserve the external behavior that the previous interface exposed. An easy way to do that is to establish a contract between the implementation and call sites, which is often reflected as a stable API. The stable API makes some contracts explicit, by construction. But many times API refactorings will happen at the same time, and that in practice typically introduces new behaviors into the downstream code that previously didn't exist before. These new behaviors, when introduced into an existing driver that was not developed for them, tends to cause bugs when not tested fully. A big reason Linux chooses not to have stable internal APIs is for agility, more or less. But nothing is free and this is the price that is paid as the project grows. My personal poster child for this stuff is the amdgpu driver. I use a Navi workstation card (WX5500) in my server and whether or not the amdgpu driver functions correctly on updates is a crap shoot. A while back it went completely headless; when I upgraded to 6.5 like 2 weeks ago, and I had to attach a monitor to look at the kernel logs (network config snafu), my dmesg had 7 kernel faults in its log from amdgpu. Seven! For a 3.5 year old card! With no desktop environment! Despite the fact that the card isn't changing, the driver is changing; new hardware support, expanded interfaces, new features, and those cause regressions. Many subsystems get overhauls and behavioral changes every release, so these things are bound to happen. Testing is not uniform; many parts of the kernel are far more well tested than others. Peripherial drivers are another example of easily broken code (Xilinx code upstream is frankly broken garbage half the time.) The Linux Kernel is a pretty amazing project in many respects but I'm frankly astonished half the time it actually boots to a working desktop successfully. |