Hacker News new | ask | show | jobs
by jowsie 1892 days ago
Same as any hypervisor/virtual machine setup. Sharing resources. You can build 1 big server with 1 big GPU and have multiple people doing multiple things on it at once, or one person using all the resources for a single intensive load.
1 comments

Thanks, this is a concise answer.

However, I was under the impression - at least on Linux - that I could run multiple workloads in parallel on the same GPU without having to resort to vGPU.

I seem to be missing something.

You can, but only directly under that OS. If you wanted to run, say, a Windows VM to run a game that doesn't work in Wine, you'd need some way to give a virtual GPU to the virtual machine. (As it is now, the only way you'd be able to do this is to have a separate GPU that's dedicated to the VM and pass that through entirely.)
In addition to the answer by skykooler, virtual GPUs also allow you to set hard resource limits (e.g., amount of L2 cache, number of streaming multiprocessors), so different workloads do not interfere with each other.
If you are running Linux in a VM, vGPU will allow acceleration with OpenGL, WebGL, Vulcan applications like games, CAD, CAM, EDA, for example.
This[1] may help.

What you're saying is true, but it's generally using either the API remoting or device emulation methods mentioned on that wiki page. In those cases, the VM does not see your actual GPU device, but emulated device provided by the VM software. I'm running Windows within Parallels on a Mac, and here[2] is a screenshot showing the different devices each sees.

In the general case, the multiplexing is all software based. The guest VM talks to the an emulated GPU, the virtualized device driver then passes those to the hypervisor/host, which then generates equivalent calls on to the GPU, then back up the chain. So while you're still ultimately using the GPU, the software-based indirection introduces a performance penalty and potential bottleneck. And you're also limited to the cross-section of capabilities exposed by your virtualized GPU driver, hypervisor system, and the driver being used by that hypervisor (or host OS, for Type 2 hypervisors). The table under API remoting shows just how varied 3D acceleration support is across different hypervisors.

As an alternative to that, you can use fixed passthrough to directly expose your physical GPU to the VM. This lets you tap into the full capabilities of the GPU (or other PCI device), and achieves near native performance. The graphics calls you make in the VM now go directly to the GPU, cutting out game of telephone that emulated devices play. Assuming, of course, your video card drivers aren't actively trying to block you from running within a VM[3].

The problem is that when a device is assigned to a guest VM in this manner, that VM gets exclusive access to it. Even the host OS can't use it while its assigned to the guest.

This article is about the fourth option – mediated passthrough. The vGPU functionality enables the graphics card to expose itself as multiple logical interfaces. So every VM gets its own logical interface to the GPU and send calls directly to the physical GPU like it does in normal passthrough mode, and the hardware handles the multiplexing aspect instead of the host/hypervisor worrying about it. Which gives you the best of both worlds.

[1] https://en.wikipedia.org/wiki/GPU_virtualization

[2] https://imgur.com/VMAGs5D

[3] https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVM...