Hacker News new | ask | show | jobs
by vvanders 3480 days ago
My naive assumption is that code reuse across platforms is a good thing, I'd love to understand why this isn't the case here or what the concrete arguments are against it.
8 comments

> My naive assumption is that code reuse across platforms is a good thing, I'd love to understand why this isn't the case here or what the concrete arguments are against it.

A driver is inherently platform-specific. It's glue that ties the hardware to the operating system. The only "correct" way to have one driver work on multiple operating systems is for the operating systems to all use the same driver model.

The ugly way is to create your own hardware abstraction layer and then write a translation layer between that and each operating system, because that's complicated and hideous.

But it's especially silly because Linux accepts suitable contributed code, so you could instead use the native Linux model as your "intermediary layer" and fix Linux if it isn't suitable in some way. And then translate that to what the closed operating system you can't modify uses.

The result is that the Linux people are happier and you have one less translation layer to maintain.

That might run into license issues. If you want to avoid licensing the other versions of your driver under GPLv2, you'd have to carefully avoid copying any code from the main kernel into your translation layer (rewriting any helper functions you end up using), and even then there's the idea of API copyright to contend with.

One might ask whether it is desirable to avoid the GPL, and there are a lot of arguments on both sides there, but it's certainly easy to run into issues when you have a GPL licensed module designed to be linked into a proprietary program (kernel).

> If you want to avoid licensing the other versions of your driver under GPLv2, you'd have to carefully avoid copying any code from the main kernel into your translation layer (rewriting any helper functions you end up using), and even then there's the idea of API copyright to contend with.

Isn't the point supposed to be to not have other versions of your driver, so you can use the same one on every platform?

By "other versions" I mean the codebase used for a given non-Linux platform, which would (hypothetically) include most of the Linux driver's code plus a translation layer from Linux APIs to that platform's.
The translation layer isn't where the interesting bits are. The parts hardware companies want to keep secret are the hardware-specific parts, not the OS-specific parts. It might even help them to open source the translation layers because then others could potentially use them and shoulder some of the maintenance cost.

I can't speak to the legal status of GPL drivers for Windows, but several seem to exist already (e.g. Windows ext4 driver), and if they were actually worried about it they could always get explicit permission from the copyright holders of the relevant code. Either they say yes and you're fine or they say no you know what pieces of code to replace.

>But it's especially silly because Linux accepts suitable contributed code, so you could instead use the native Linux model as your "intermediary layer" and fix Linux if it isn't suitable in some way. And then translate that to what the closed operating system you can't modify uses.

But Linux repesents a tiny portion of the gaming community, so that approach would make no sense at all for a GPU vendor. C'mon.

Then they aren't going to get their driver upstream. End of story. Kernel developers have already done this once (Dave hinted at Exynos drivers in the past in his other posts) and it was a large amount of work to un-screw the pooch once all this crap came along.

I know that Linux people really really just want the kernel to take one for the team so they can have GPUs because that's just the goal, and clearly the goal is good and the means don't matter at all and everything else is irrelevant. 100,000 lines of crap code, 200k? 500k? Who cares, it's all in the name of GPUs clearly. It's obviously worth it no matter what.

But the kernel developers do not see it that way, and for good reason -- because once it's in tree, they are all on the hook for it and they all have to deal with the swamp, the added complexity, the maintenance, the un-fucking of this entire HAL, etc etc.

Having worked on a large open source project, I can assure you, it sucks when you have to say "This isn't acceptable and we aren't merging it", even when it's a feature the users want, and one someone worked on for a long time. It is also, almost always, the right thing to do in the long run (and several of those features did come back, in acceptable ways, in our case).

> But Linux repesents a tiny portion of the gaming community, so that approach would make no sense at all for a GPU vendor. C'mon.

The growth market for GPUs is GPGPU and servers. And Linux represents a large portion of the programming and server communities.

More to the point, as soon as you support Linux at all then it doesn't matter who has more share, it's still less work to do the above than have to maintain another translation layer.

But AMD doesn't. GPGPU is already supported on nvidia drivers with their opaque blob. AMD has a more-transparent blob. People who want this to work already have a solution. This kernel change is probably important to some people, but those who simply want to run a GPGPU cluster on linux already have workable solutions.
The GPGPU market is the polar opposite of the gaming market.

Game developers might like to see clean driver source but they don't get to choose what kind of GPU their customers have already bought. And 99% of gamers are not going to choose their GPU based on Linux drivers. So nobody has any leverage and vendors have no incentive to change.

Meanwhile thousands of universities and institutions are each going to be looking for 25,000 GPUs and they can choose what brand they buy based on what makes their internal developers happy. Hosts like Amazon and Google are each going to be buying millions of GPUs, and having better and more transparent drivers so they can more easily e.g. improve power consumption by a small percentage, can save them a million dollars/year in electricity.

Someone like Google could come to each vendor and say "first to have mainline kernel drivers gets all our business" at any point. Or the same result in the other order; once there are clean drivers third parties are more likely to make power consumption and performance improvements that give AMD the edge when the major customers crunch the numbers.

There is a significant competitive advantage in it for AMD to get this right.

Very good point, there's definitely a growing market for high-bandwidth GPGPU solutions, neural networks is probably just the start.
I agree with you almost entirely, except the part about fixing Linux. If the abstraction that Linux provides isn't suitable for some reason, it probably isn't straightforward to change it because of compatibility with existing code.
That's not so much a concern within the kernel boundary, which is the case that applies here. If you have a compelling reason to redesign an internal API, you "just" have to fix up all the code across the tree that consumes it. Changes are regularly made to the internal VFS interfaces, for example.
It's also often the case that kernel-driver interfaces are extended without breaking compatibility. In those cases, you want to ensure that the extensions are suitable for more than one driver to consume.
Or future changes in the linux target need to be translated to all other target wrappers.
The problem isn't that sharing code across platforms is bad, it's that not sharing code within Linux is bad. Airlie is basically saying that if the kernel API and subsystems are somehow inadequate, AMD should improve them directly instead of covering them up with a bunch more code.
> Airlie is basically saying that if the kernel API and subsystems are somehow inadequate, AMD should improve them directly instead of covering them up with a bunch more code.

And you really believe that the maintainers will be accepting a giant patch that changes the API and subsystem completely (though into something better) that has the risk of causing lots of regressions to existing drivers? And you believe that AMD is supposed to fix all the regressions that are caused in drivers by other vendors that this change causes?

Of course not, the maintainers will accept a well thought out series of patches that each make one small logical change towards the better interface.

And yes - who else is supposed to fix all the regressions caused by changes that AMD wants? Volunteers who would rather work on something else? If you want a change, you get to support the regressions - and if AMD's work gets merged, then anyone ELSE who wants to make a change in that page needs to support AMD's regressions.

Hence wanting to make sure that the changes from AMD are manageable and flexible enough to allow further changes.

> If you want a change, you get to support the regressions

And what about a change to a stable internal kernel API, which the kernel developers refuse?

No, they just want a stable future, around which they can plan the API, but so far no one has delivered on that tiny requirement.
Linux got where it is by evolving how the kernel and drivers interact whenever needed, without waiting to coordinate with outsiders and their closed work.
The Linux kernel does not have stable internal kernel APIs.

https://www.kernel.org/doc/Documentation/stable_api_nonsense...

This is exactly my point.
(without looking at details) The problem is that Windows and Linux expose hardware and drivers in different ways. You can shim things up to make the code work, but you end up with a driver that doesn't look like a Linux driver and doesn't work like a Linux driver and can't easily be maintained by people working in the Linux graphics drivers is going to be a problem.

If the driver doesn't really belong in the Linux kernel source for those reasons, it's better to keep it outside the kernel tree.

AIUI, the problem is

code re-use between drivers of different vendors but the same kernel/OS,

VS

code re-use between drivers of the same vendor but different kernels/OSes.

At the end of the day, both sides are arguing for code re-use, of sorts.

The open source developers don't care about invisible code reuse in a closed source driver. HALs across open source codebases do exist too (eg for ZFS) but Linux in particular does not like them.
AMD should move their HAL code into their Windows driver, making it a superset of the Linux driver. AMD would get to reduce driver code duplication and Linux kernel developers don't need to merge the AMD's ugly Linux/Windows HAL.
> AMD should move their HAL code into their Windows driver, making it a superset of the Linux driver.

This might theoretically make sense if the Linux subsystem was very stable over many years. Practice shows that the Windows interfaces are what are a lot more stable over the years and changes in them are communicated for a long time beforehand so that hardware vendors can begin changing their drivers long beforehand.

Regardless of one's thoughts on AMD, I think this is broadly true. Microsoft may do a lot of things poorly, but one thing they are good at (arguably, the only thing they're good at, hell maybe the key to their success, really) is maintaining compatibility and not breaking stuff.
This is explained here: https://www.kernel.org/doc/Documentation/stable_api_nonsense...

Linux maintains compatibility by fixing the driver themselves when they break it. Microsoft cannot (actually, can, and does) break their interfaces since they don't control the drivers.

This allows Linux to keep improving without breaking things in production; while Microsoft has to either maintain huge backward compatibility abstractions for changes, go YOLO and break stuff (often unknowingly) or abstain from improving their OS.

It is a good thing. For the developers of that piece of code (AMD in this case).

However, it is introducing a second API for a very specific subset of hardware into a kernel that is being developed by not just AMD people. Dave Airlie is rightly saying that the second API and hence two different code structures makes the whole DRI infrastructure harder to maintain for everyone else.

And Dave's responsibility is to everyone else, not to AMD.

It is a good thing for the driver writer as they have less difference between their targets.

It is a bad thing for the targets as they implement both the driver functionality and the abstractions required to make the same code work cross platform. The response linked describes the cost of those abstractions to the target (Linux kernel in this case).

I believe this link is the "concrete argument" against a unified abstraction layer in this particular instance.