| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by imtringued 922 days ago

Unified shared memory is an intel specific extension of OpenCL.

SYCL builds on top of OpenCL so you need to know the history of OpenCL. OpenCL 2.0 introduced shared virtual memory, which is basically the most insane way of doing it. Even with coarse grained shared virtual memory, memory pages can transparently migrate from host to device on access. This is difficult to implement in hardware. The only good implementations were on iGPUs simply because the memory is already shared. No vendor, not even AMD could implement this demanding feature. You would need full cache coherence from the processor to the GPU, something that is only possible with something like CXL and that one isn't ready even to this day.

So OpenCL 2.x was basically dead. It has unimplementable mandatory features so nobody wrote software for OpenCL 2.x.

Khronos then decided to make OpenCL 3.0, which gets rid of all these difficult to implement features so vendors can finally move on.

So, Intel is building their Arc GPUs and they decided to create a variant of shared virtual memory that is actually implementable called unified shared memory.

The idea is the following: All USM buffers are accessible by CPU and GPU, but the location is defined by the developer. Host memory stays on the host and the GPU must access it over PCIe. Device memory stays on the GPU and the host must access it over PCIe. These types of memory already cover the vast majority of use cases and can be implemented by anyone. Then finally, there is "shared" memory, which can migrate between CPU and GPU in a coarse grained matter. This isn't page level. The entire buffer gets moved as far as I am aware. This allows you to do CPU work then GPU work and then CPU work. What doesn't exist is a fully cache coherent form of shared memory.

https://registry.khronos.org/OpenCL/extensions/intel/cl_inte...