| > So you say having a decoupled arrangement in software (which happens to be a de facto open standard) is a "terrible awful idea" and that instead you should just rely on whatever your proprietary hardware graphics vendor proposes to you? Why? Sandboxing, and resource quotas / allocations / reservations. By itself, a paravirtualized GPU just treats each userland workload launched by any given guest onto the GPU, as all being siblings — exactly as if there was no virtualization and you were just running multiple workloads on one host. And so, just like multiple GPU-using apps on a single non-virtualized host, these workloads will get "thin-provisioned" the resources they need, as they ask for them, with no advance reservation; and workloads may very well end up fighting over those resources, if they attempt to use a lot of them. You're just not supposed to run two things that attempt to use "as much VRAM as possible" at once. This means that, on a multi-tenant hypervisor host (e.g. the "with GPU" compute machines in most clouds), a paravirtualized GPU would give no protection at all from one tenant using all of a host GPU's resources, leaving none left over for the other guests sharing that host GPU. The cloud vendor would have guaranteed each tenant so much GPU capacity — but that guarantee would be empty! To enforce multi-tenant QoS, you need hardware-supported virtualization — i.e. the ability to make "all of the GPU" actually mean "some of the GPU", defining how much GPU that is on a per-guest basis. (And even in PC use-cases, you don't want a guest to be able to starve the host! Especially if you might be running untrusted workloads inside the guest, for e.g. forensic analysis!) |
An operating system doesn't require virtualisation to manage application resource usage of CPU time, system memory, disk storage, etc – although the details differ from OS to OS, most operating systems have quota and/or prioritisation mechanisms for these – why not for the GPU too?
There is no reason in principle why you can't do that for the GPU too. In fact, there have been a series of Linux cgroup patches going back several years now, to add GPU quotas to Linux cgroups, so you can setup per-app quotas on GPU time and GPU memory – https://lwn.net/ml/cgroups/20231024160727.282960-1-tvrtko.ur... is the most recent I could find (from 6-7 months back), but there were earlier iterations broader in scope, e.g. https://lwn.net/ml/cgroups/20210126214626.16260-1-brian.welt... (from 3+ years ago). For whatever reason none of these have yet been merged to the mainline Linux kernel, but I expect it is going to happen eventually (especially with all the current focus on GPUs for AI applications). Once you have cgroups support for GPUs, why couldn't a paravirtualised GPU driver on a Linux host use that to provide GPU resource management?
And I don't see why it has to wait for GPU cgroups to be upstreamed in the Linux kernel – if all you care about is VMs and not any non-virtualised apps on the same hardware, why couldn't the hypervisor implement the same logic inside a paravirtualised GPU driver?