| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mohsen1 483 days ago
	> For extreme performance, we discover and use an out-of-doc PTX instruction: ld.global.nc.L1::no_allocate.L2::256B. This instruction will lead to an undefined behavior: accessing volatile GPU memory with non-coherent read-only PTX modifiers .nc. But the correctness is tested to be guaranteed with .L1::no_allocate on Hopper architectures, and performance will be much better.

1 comments

k_sze 483 days ago

Practically speaking, is it possible for NVIDIA to "pull the rug" later, intentionally or otherwise, by subtly changing the behaviour of this out-of-doc instruction on new architectures?

link

ammo1662 483 days ago

They could. That's why there is a switch to disable it.

> If you find kernels not working on some other platforms, you may add DISABLE_AGGRESSIVE_PTX_INSTRS=1 to setup.py and disable this, or file an issue.

link