Hacker News new | ask | show | jobs
by rbanffy 554 days ago
> Lots of people think GPUs can only do floating point math.

IIRC, every Raspberry Pi is brought up by the GPU setting up the system before the CPU is brought out of reset and the bootloader looks for the OS.

> it is a travesty to persist with the "offloading to accelerator" model.

Operating systems would need to support heterogeneous processors running programs with different ISAs accessing the same pools of memory. I'd LOVE to see that. It'd be extremely convenient to have first-class processes running on the GPU MIMD cores.

I'm not sure there is much research done in that space. I believe IBM mainframe OSs have something like that because programmers are exposed to the various hardware assists that run as coprocessors sharing the main memory with the OS and applications.

2 comments

> I'm not sure there is much research done in that space.

There is. And the finest example I can think of is Barrelfish https://barrelfish.org

Interesting - it resembles a network of heterogeneous systems that can share a memory space used primarily for explicit data exchange. Not quite what I was imagining, but probably much simpler to implement than a Unix where the kernel can see processes running on different ISAs on a shared memory space.

I guess hardware availability is an issue, as there aren't many computers with, say, an ARM, a RISC-V, an x86, and an AMD iGPU sharing a common memory pool.

OTOH, there are many where a 32-bit ARM shares the memory pool with 64-bit cores. Usually the big cores run applications while the small ARM does housekeeping or other low-latency task.

> Not quite what I was imagining, but probably much simpler to implement than a Unix where the kernel can see processes running on different ISAs on a shared memory space.

Indeed. The other argument is that treating the computer as a distributed system can make it scale better to say hundreds of cores compared to a lock-based SMP system.

> treating the computer as a distributed system

Sure, but where's the fun in that?

Up to GPGPUs, there was no reason to build a machine with multiple CPUs of different architectures except running different OSs on them (such as the Macs, Suns and Unisys mainframes with x86 boards for running Windows side-by-side with a more civilized OS). With GPGPUs you have machines with a set of processors that are good on many things, but not great at SIMD and one that's awesome at SIMD, but sucks for most other things.

And, as I mentioned before, there are lots of ARM machines with 64-bit and ultra-low-power 32-bit cores sharing the same memory map. Also, even x86 variants with different ISA extensions can be treated as different architectures by the OS - Intel had to limit the fast cores of its early asymmetric parts because the low-power cores couldn't do AVX512 and OSs would not support migrating a process to the right core on an invalid instruction fault.

The problem is that GPUs are kind of bad at being general-purpose, so it doesn't really make sense to expose the hardware that way.
If the OS supports it, you can make programs that start threads on CPUs and GPUs and let those communicate. You run the SIMD-ish functions on the GPUs and the non-SIMD-heavy functions on the CPU cores.

I have a strong suspicion GPUs aren't as bad at general-purpose stuff as we perceive and we underutilize them because it's inconvenient to shuttle data over an architectural wall that's not really there in iGPUs.

Maybe it doesn't make sense, but it'd be worth looking into just to know where the borders of the problem lie.

Nah, they're pretty bad. They don't speculate or prefetch nearly as well as CPUs, and most code kind of relies on that to be fast. If you are programming for a GPU and you want to go fast you generally have to work quite hard for it.