| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dyingkneepad 1584 days ago

On my system, the CPU sees the GPU as a PCI device. The "PCI config space" [0] is a standard thing and so the CPU can read it and figure out its device ID, vendor ID, revision, class, etc. From that, the OS looks at its PCI drivers and tries to find which one claims to drive that specific PCI device_id/vendor_id combination (or class in case there's some kind of generic universal driver for a certain class).

From there, the driver pretty much knows what to do. But primarily the driver will map the registers to memory addresses, so accessing offset 0xF0 from that map is equivalent as accessing register 0xF0. The definition of what each register does is something that the HW developers provide to the SW developers [1].

Setting modes (screen resolution) and a lot of other stuff is done directly by reading and writing to these registers. At some point they also have to talk about memory (and virtual addresses) and there's quite a complicated dance to map GPU virtual memory to CPU virtual memory. On discrete GPUs the data is actually "sent" to the memory somehow through the PCI bus (I suppose the GPU can read directly from the memory without going through the CPU?), but in the driver this is usually abstracted to "this is another memory map". On integrated systems both the CPU and GPU read directly from the system memory, but they may not share all caches so extra care is required here. In fact, caches may also mess the communication on discrete graphics, so extra care is always required. This paragraph is mostly done by the Kernel driver in Linux.

At some point the CPU will tell the GPU that a certain region of memory is the framebuffer to be displayed. And then the CPU will formulate binary programs that are written in the GPU's machine code, and the CPU will submit those programs (batches) and the GPU will execute them. These programs are generally in the form of "I'm using textures from these addresses, this memory holds the fragment shader, this other holds the geometry shader, the configuration of threading and execution units is described in this structure as you specified, SSBO index 0 is at this address, now go and run everything". After everything is done the CPU may even get an interrupt from the GPU saying things are done, so they can notify user space. This paragraph describes mostly the work done by the user space driver (in Linux, this is Mesa), which implements OpenGL/Vulkan/etc abstractions.

[0]: https://en.wikipedia.org/wiki/PCI_configuration_space [1]: https://01.org/linuxgraphics/documentation/hardware-specific...