| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by joeguilmette 3480 days ago
	Anyone want to explain this in simple terms for us folk not knee-deep in kernel graphics driver politics?

3 comments

shmerl 3480 days ago

Very short - AMD made an abstraction layer, to share effort between Linux and other platforms (i.e. Windows and etc.). Kernel/DRM maintainers don't like that, since it causes several issues detailed in that thread (harder to understand logic of the driver, slowdown of DRM improvement itself, indirect workflow of AMD developers and so on). For the reference, DRM here is Direct Rendering Manager[1], nothing to do with crooked Digital Restrictions Management.

1. https://en.wikipedia.org/wiki/Direct_Rendering_Manager

wolfgke 3480 days ago

> nothing to do with crooked Digital Restrictions Management.

Please prefer the term "Digital Restriction Management". :-)

theparanoid 3480 days ago

Nvidia uses largely the same driver code for both linux and windows in their proprietary driver (I believe they call it a unified driver).

AMD tried the same in their open source driver and were rejected by the kernel maintainer. Unified drivers have code sharing advantages but don't follow the practices of the linux kernel.

vvanders 3480 days ago

My naive assumption is that code reuse across platforms is a good thing, I'd love to understand why this isn't the case here or what the concrete arguments are against it.

AnthonyMouse 3480 days ago

> My naive assumption is that code reuse across platforms is a good thing, I'd love to understand why this isn't the case here or what the concrete arguments are against it.

A driver is inherently platform-specific. It's glue that ties the hardware to the operating system. The only "correct" way to have one driver work on multiple operating systems is for the operating systems to all use the same driver model.

The ugly way is to create your own hardware abstraction layer and then write a translation layer between that and each operating system, because that's complicated and hideous.

But it's especially silly because Linux accepts suitable contributed code, so you could instead use the native Linux model as your "intermediary layer" and fix Linux if it isn't suitable in some way. And then translate that to what the closed operating system you can't modify uses.

The result is that the Linux people are happier and you have one less translation layer to maintain.

comex 3480 days ago

That might run into license issues. If you want to avoid licensing the other versions of your driver under GPLv2, you'd have to carefully avoid copying any code from the main kernel into your translation layer (rewriting any helper functions you end up using), and even then there's the idea of API copyright to contend with.

One might ask whether it is desirable to avoid the GPL, and there are a lot of arguments on both sides there, but it's certainly easy to run into issues when you have a GPL licensed module designed to be linked into a proprietary program (kernel).

AnthonyMouse 3479 days ago

> If you want to avoid licensing the other versions of your driver under GPLv2, you'd have to carefully avoid copying any code from the main kernel into your translation layer (rewriting any helper functions you end up using), and even then there's the idea of API copyright to contend with.

Isn't the point supposed to be to not have other versions of your driver, so you can use the same one on every platform?

comex 3479 days ago

By "other versions" I mean the codebase used for a given non-Linux platform, which would (hypothetically) include most of the Linux driver's code plus a translation layer from Linux APIs to that platform's.

EpicEng 3479 days ago

>But it's especially silly because Linux accepts suitable contributed code, so you could instead use the native Linux model as your "intermediary layer" and fix Linux if it isn't suitable in some way. And then translate that to what the closed operating system you can't modify uses.

But Linux repesents a tiny portion of the gaming community, so that approach would make no sense at all for a GPU vendor. C'mon.

aseipp 3479 days ago

Then they aren't going to get their driver upstream. End of story. Kernel developers have already done this once (Dave hinted at Exynos drivers in the past in his other posts) and it was a large amount of work to un-screw the pooch once all this crap came along.

I know that Linux people really really just want the kernel to take one for the team so they can have GPUs because that's just the goal, and clearly the goal is good and the means don't matter at all and everything else is irrelevant. 100,000 lines of crap code, 200k? 500k? Who cares, it's all in the name of GPUs clearly. It's obviously worth it no matter what.

But the kernel developers do not see it that way, and for good reason -- because once it's in tree, they are all on the hook for it and they all have to deal with the swamp, the added complexity, the maintenance, the un-fucking of this entire HAL, etc etc.

Having worked on a large open source project, I can assure you, it sucks when you have to say "This isn't acceptable and we aren't merging it", even when it's a feature the users want, and one someone worked on for a long time. It is also, almost always, the right thing to do in the long run (and several of those features did come back, in acceptable ways, in our case).

AnthonyMouse 3479 days ago

> But Linux repesents a tiny portion of the gaming community, so that approach would make no sense at all for a GPU vendor. C'mon.

The growth market for GPUs is GPGPU and servers. And Linux represents a large portion of the programming and server communities.

More to the point, as soon as you support Linux at all then it doesn't matter who has more share, it's still less work to do the above than have to maintain another translation layer.

freeone3000 3479 days ago

But AMD doesn't. GPGPU is already supported on nvidia drivers with their opaque blob. AMD has a more-transparent blob. People who want this to work already have a solution. This kernel change is probably important to some people, but those who simply want to run a GPGPU cluster on linux already have workable solutions.

toxik 3479 days ago

Very good point, there's definitely a growing market for high-bandwidth GPGPU solutions, neural networks is probably just the start.

nindalf 3480 days ago

I agree with you almost entirely, except the part about fixing Linux. If the abstraction that Linux provides isn't suitable for some reason, it probably isn't straightforward to change it because of compatibility with existing code.

caf 3480 days ago

That's not so much a concern within the kernel boundary, which is the case that applies here. If you have a compelling reason to redesign an internal API, you "just" have to fix up all the code across the tree that consumes it. Changes are regularly made to the internal VFS interfaces, for example.

wtallis 3480 days ago

It's also often the case that kernel-driver interfaces are extended without breaking compatibility. In those cases, you want to ensure that the extensions are suitable for more than one driver to consume.

posterboy 3480 days ago

Or future changes in the linux target need to be translated to all other target wrappers.

0xcde4c3db 3480 days ago

The problem isn't that sharing code across platforms is bad, it's that not sharing code within Linux is bad. Airlie is basically saying that if the kernel API and subsystems are somehow inadequate, AMD should improve them directly instead of covering them up with a bunch more code.

wolfgke 3480 days ago

> Airlie is basically saying that if the kernel API and subsystems are somehow inadequate, AMD should improve them directly instead of covering them up with a bunch more code.

And you really believe that the maintainers will be accepting a giant patch that changes the API and subsystem completely (though into something better) that has the risk of causing lots of regressions to existing drivers? And you believe that AMD is supposed to fix all the regressions that are caused in drivers by other vendors that this change causes?

brongondwana 3480 days ago

Of course not, the maintainers will accept a well thought out series of patches that each make one small logical change towards the better interface.

And yes - who else is supposed to fix all the regressions caused by changes that AMD wants? Volunteers who would rather work on something else? If you want a change, you get to support the regressions - and if AMD's work gets merged, then anyone ELSE who wants to make a change in that page needs to support AMD's regressions.

Hence wanting to make sure that the changes from AMD are manageable and flexible enough to allow further changes.

wolfgke 3480 days ago

> If you want a change, you get to support the regressions

And what about a change to a stable internal kernel API, which the kernel developers refuse?

toast0 3480 days ago

(without looking at details) The problem is that Windows and Linux expose hardware and drivers in different ways. You can shim things up to make the code work, but you end up with a driver that doesn't look like a Linux driver and doesn't work like a Linux driver and can't easily be maintained by people working in the Linux graphics drivers is going to be a problem.

If the driver doesn't really belong in the Linux kernel source for those reasons, it's better to keep it outside the kernel tree.

Qwertious 3480 days ago

AIUI, the problem is

code re-use between drivers of different vendors but the same kernel/OS,

VS

code re-use between drivers of the same vendor but different kernels/OSes.

At the end of the day, both sides are arguing for code re-use, of sorts.

justincormack 3480 days ago

The open source developers don't care about invisible code reuse in a closed source driver. HALs across open source codebases do exist too (eg for ZFS) but Linux in particular does not like them.

cpeterso 3480 days ago

AMD should move their HAL code into their Windows driver, making it a superset of the Linux driver. AMD would get to reduce driver code duplication and Linux kernel developers don't need to merge the AMD's ugly Linux/Windows HAL.

wolfgke 3480 days ago

> AMD should move their HAL code into their Windows driver, making it a superset of the Linux driver.

This might theoretically make sense if the Linux subsystem was very stable over many years. Practice shows that the Windows interfaces are what are a lot more stable over the years and changes in them are communicated for a long time beforehand so that hardware vendors can begin changing their drivers long beforehand.

Kadin 3479 days ago

Regardless of one's thoughts on AMD, I think this is broadly true. Microsoft may do a lot of things poorly, but one thing they are good at (arguably, the only thing they're good at, hell maybe the key to their success, really) is maintaining compatibility and not breaking stuff.

tremon 3480 days ago

It is a good thing. For the developers of that piece of code (AMD in this case).

However, it is introducing a second API for a very specific subset of hardware into a kernel that is being developed by not just AMD people. Dave Airlie is rightly saying that the second API and hence two different code structures makes the whole DRI infrastructure harder to maintain for everyone else.

And Dave's responsibility is to everyone else, not to AMD.

din-9 3480 days ago

It is a good thing for the driver writer as they have less difference between their targets.

It is a bad thing for the targets as they implement both the driver functionality and the abstractions required to make the same code work cross platform. The response linked describes the cost of those abstractions to the target (Linux kernel in this case).

tlow 3480 days ago

I believe this link is the "concrete argument" against a unified abstraction layer in this particular instance.

pcr0 3480 days ago

But Nvidia's proprietary driver is a download right? Why is AMD trying to merge theirs into the kernel?

Qwertious 3480 days ago

Nvidia's proprietary driver breaks upon every new kernel release, which is why they have a shim. Furthermore, Nvidia can't ship their driver in the official Linux kernel due to copyright issues, and they're forced to handle all the maintenance burden of their driver (whereas AMD reaps the benefits of Intel's GPU driver bugfixing, and vice versa, thus lowering both Intel's and AMD's driver costs on Linux).

Besides, Nvidia's been having trouble with their Tegra GPUs on Android, and as a result have been forced to pitch in a bit on Nouveau (the reverse-engineered open-source Nvidia driver). They're still having trouble with their driver situation on mobile, as a result of their unwillingness to play ball with the kernel.

Actually, that last sentence above - I'm really not too confident on that, I've heard various hearsay but the only source I concretely remember is the "other drivers" section of http://richg42.blogspot.com.au/2014/05/the-truth-on-opengl-d...

Nullabillity 3479 days ago

> Furthermore, Nvidia can't ship their driver in the official Linux kernel

Nvidia has every ability to ship it, they just refuse to open it.

valdiorn 3479 days ago

probably rightfully so, if this is the sort of welcome they'd get.

pif 3479 days ago

Opening their driver's code is different from merging it into the kernel source.

Kubuxu 3479 days ago

There is huge difference between opening code and merging it to the Linux kernel.

alkonaut 3479 days ago

> Nvidia's proprietary driver breaks upon every new kernel release, which is why they have a shim.

ELI5: why does each Linux kernel release break driver code? It can't be THAT hard to just have a stable interface and leave it for long periods of time, e.g. only bumping it on major version bumps in the Kernel?

aseipp 3479 days ago

Because in practice, APIs inside Linux do, in fact, change quite a bit -- and by itself maybe that wouldn't matter so much, but the nvidia driver has an insane amount of surface area on top of it. It's a massive driver. You can imagine then, that breaking it is actually easier than you might think.

There is no rule kernel interfaces can only change on major bumps. In reality, they change quite frequently, as new APIs and drivers are merged in, which requires generalization, refactoring, etc across API boundaries to keep things sane. Kernel developers specifically reject the notion of a "stable ABI" like this because they feel it would tie their hands, and lead them to design APIs and workarounds for things which would otherwise be fundamentally simple if you "just" break some function and its call sites. APIs in Linux tend to organically grow, and die, as they are needed, by this logic.

Why wait 5 years for a "major version bump" to delete an API call, you could just do it today and fix the callers, since they're all right there in the kernel tree? It's far easier and more straightforward to do this than attempting to work around "stable" systems for very long periods of time, which is likely to accumulate cruft.

Because they do not care about out-of-tree code, when an API changes, their obligations are to refactor the code using that API, inside the kernel, and nothing else. That means the person making the change also has to fix all the other drivers, too, even if they don't necessarily maintain them. Out of tree users will have to adapt on their own.

This also explains why they do not want a HAL. When a Linux driver interface changes, the person changing it is responsible for changing everything else and fixing other drivers. That means if AMD wants a large change, it may have to go and touch the Intel driver and refactor it to match the new API. If Intel wants something new, they may have to touch the AMD driver in turn. This, in effect, helps reduce the burden and share responsibilities among the affected people.

They don't want a HAL because a HAL is a massive impediment to exactly that workflow. If Intel wants to improve a DRM/DRI interface in the kernel for their GPUs, they could normally do so and touch all the other drivers. Out with the old, in with the new. But now, they'd have to also wade through like 50,000 lines of AMD abstraction code that no other system, no other driver, uses. It effectively makes life worse for every graphics subsystem maintainer when this happens, except for AMD I guess since they can pawn off some of the work. But if AMD plays by the rules -- Intel fixing their AMDGPU driver when they make a change shouldn't be that unusual, or any more difficult compared any other graphics driver. And likewise -- AMD making a change and having to fix Intel's driver? That's just par for the course.

Obviously Linux isn't perfect here and they do, and have, accepted questionable things in the past, or have rejected seemingly reasonable API changes out of stability fear (while simultaneously not wanting a stable ABI -- which is fair). But the logic is basically something like the above, as to why this is all happening.

wtallis 3480 days ago

AMD's recent strategy has been to try to confine the proprietary stuff to userspace, and to implement an open-source kernel driver that can be used by either the proprietary userspace driver or open-source userspace stack.

theparanoid 3480 days ago

It's already part of the kernel, this was a re-architecture of the display portion of the driver.

ajdlinux 3480 days ago

The amdgpu driver is open source, not proprietary.

gonmf 3480 days ago

Open source != free software.

wolfgke 3480 days ago

The difference between open source and free software is mostly in the political camp the word comes from. Read the OSI definition of open source if you don't believe:

> https://opensource.org/osd-annotated

The reasons why "free software" people don't like the word "open source" are indeed political:

> https://www.gnu.org/philosophy/open-source-misses-the-point....

For software for which the source code is available, but does not give the four freedoms:

> https://www.gnu.org/philosophy/free-sw.html#content

it is common to use the word "shared source" (originally devised by Microsoft):

> https://en.wikipedia.org/wiki/Shared_source

SXX 3479 days ago

Kernel driver of proprietary AMDGPU-PRO are licensed exactly same way as modules in mainline kernel. Most of them dual-licensed under MIT and GPL so BSD and other projects can use them.

majewsky 3479 days ago

But it is free software. It resides in the kernel tree: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-st...

topkekz 3480 days ago

https://www.phoronix.com/scan.php?page=article&item=amd_cata...

Note that both the amd and the nvidia kernel modules always have been FOSS because of the GPL license. It's just that nvidia provides it by its own ways, not through the official linux branch, and thus doesn't have to respect linux rules nor to document the driver.

SXX 3479 days ago

Note that both the amd and the nvidia kernel modules always have been FOSS because of the GPL license.

Only open source part of their modules was shim while 99% of driver is contained in blob. That's true for both Nvidia or ATI/AMD fglrx.

tayo42 3480 days ago

Maybe a stupid question but why is it a big deal? Who else does this effect other then nvidia and amd?

foota 3480 days ago

I think the idea is that the kernel maintainers can't break the code that the AMD driver relies on and in order to properly do that they need to be able to easily grok all the driver implementations, and abstraction layers make that more difficult.

kazarov 3480 days ago

I don't know anything about linux display architecture, but this point sounds really weird to me from general software engineering perspective. Isn't one of the goals of having drivers in an OS to establish a formalized interface between drivers and kernel, and thus achieve separation of concerns between driver maintainers and kernel maintainers? Requiring that kernel maintainers understand how all drivers work does not sound very scalable.

wtallis 3480 days ago

Linux developers want to be able to share code across drivers for similar devices, and they want to be able to refactor and improve that shared code without worrying about out-of-tree drivers. That strategy has worked well for eg. WiFi drivers where the shared mac80211 subsystem allows a lot of logic to be pulled out of individual NIC drivers, and improvements to mac80211 more or less automatically benefit all participating drivers.

kazarov 3480 days ago

Makes sense, but it doesn't have to be mutually exclusive. It is possible to have a fixed network driver interface, and some common helpers orthogonal to it. This way driver developers could chose whether to benefit from common code or not. I guess if linux developer want to enforce code sharing, this wouldn't work, but I wonder why they would do it. Seems like it just makes life harder for both parties.

bnastic 3480 days ago

It's exactly opposite with Linux, where stable kernel API/ABI are avoided as a matter of principle

Qwertious 3480 days ago

Internal API/ABI is broken regularly, but Linus is fairly clear on the subject of breaking userspace - https://lkml.org/lkml/2012/12/23/75

tracker1 3480 days ago

I wouldn't say avoided.. A lot of the interfaces are still compatible with prior versions as is the software that runs on it, or we'd be on Kernel v50+ by now. Not all software, but a bit.

That said, the community isn't afraid of breaking changes to push future versions forward.

caf 3480 days ago

That's a stable internal kernel API/ABI that's not provided. The stable external (userspace-facing) ABI is very much maintained.

foota 3480 days ago

I agree that this is how it seems it should be. I was mostly basing my comment off of this: https://lists.freedesktop.org/archives/dri-devel/2016-Februa...

d1str0 3480 days ago

Thank you. This comment clarifies things quite a bit for me.

posterboy 3480 days ago

there are many other gpu vendors, e.g. in arm cores. I guess this is the code tree in question: https://github.com/torvalds/linux/tree/master/drivers/gpu/dr...

jcoffland 3480 days ago

Intel for one.

rasz_pl 3480 days ago

Short version: AMD as a company is dysfunctional. Perfectly happy to throw away $300Mil on failed acquisition, unwilling to hire sufficient number of competent driver developers.

Result is no people to do the required work.

chei0iaV 3479 days ago

Where did you get this idea from? Just because a patch set got rejected, we can dismiss the fact that AMD had been providing excellent open source driver support for years, and just call them incompetent?

EpicEng 3479 days ago

AMD is prioritizing the business here, which makes perfect sense. Why would they spend more money than necessary to appease the Linux maintainers in order to serve the tiny population of Linux gamers who probably don't give a crap about how the kernel is maintained? Their business is on Windows.

mixedCase 3479 days ago

Neither AMD nor Nvidia develop Linux drivers for Linux gamers. Not that they don't pay attention to us with fixes and optimizations here and there, but a good part of that is games simply being applications that make use of a lot of driver functionality that might not receive enough testing otherwise.

serge2k 3479 days ago

more like they don't want to spend twice as much trying to maintain a separate linux driver, because that's a bit ridiculous.