| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Joel_Mckay 490 days ago
	In general, NVIDIA never had proper bug-free support in C for well over a decade (hidden de-allocation errors etc.), and essentially everyone focused on the cuda compiler with the C++ API. To be honest, it still bothers me an awful GPU mailbox design is still the cutting-edge tech for modern computing. GPU rootkits are already a thing... Best of luck =3

2 comments

MrLeap 490 days ago

GPU rootkits are sounds like misnomer unless they start getting rewritable persistent storage (maybe they do now and my knowledge is out of date).

If you've got malicious code in your GPU, shut it off wait a few seconds, turn it back on.

Actually looking at the definition, my understanding might be off or the definition has morphed over time. I used to think it wasn't a rootkit unless it survived reinstalling the OS.

link

Joel_Mckay 490 days ago

These have direct access to the dma channel of your storage device, and POC have proven mmu/CPU bypass is feasible.

My point was the current architecture is a kludge built on a kludge... =3

link

einpoklum 490 days ago

> with the C++ API

The funny thing is that the "C++ API" is almost entirely C-like, foregoing almost everything beneficial and convenient about C++, while at the same time not being properly limited to C.

(which is why I wrote this: https://github.com/eyalroz/cuda-api-wrappers/ )

> an awful GPU mailbox design is still the cutting-edge tech

Can you elaborate on what you mean by a "mailbox design"?

link

pjmlp 489 days ago

Depends on which CUDA API one is looking to,

https://docs.nvidia.com/cuda/cuda-c-std/index.html

link

einpoklum 489 days ago

I meant the fundamental ones, mostly:

* CUDA Driver API: https://docs.nvidia.com/cuda/cuda-driver-api/index.html * NVRTC: https://docs.nvidia.com/cuda/nvrtc/index.html * (CUDA Runtime API, very popular but not entirely fundamental as it rests on the driver API)

the CUDA C++ library is a behemoth that sits on top of other things.

link

Joel_Mckay 490 days ago

In general, a modern GPU must copy its workload into/out-of its own working area in vram regardless of the compute capability number, and thus is constrained by the same clock-domain-crossing performance bottleneck many times per transfer.

At least the C++ part of the systems were functional enough to build the current house of cards. Best of luck =3

link