| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by janwas 572 days ago
	My understanding is that there is a lot of hand-writing (not just fine-tuning) going on. AFAIK CuDNN and TensorRT are written directly as SASS, not CUDA. And the presence of FP8 in H100, but not A100, would likely require a complete rewrite.

1 comments

dragontamer 572 days ago

Cub, thrust and many other libraries that make those kernels possible don't need to be rewritten.

When you write a merge sort in CUDA, you can keep it across all versions. Maybe the new instructions can improve a few corner cases, but it's not like AVX to AVX512 where you need to rewrite everything.

Ex: https://github.com/NVIDIA/cub/blob/main/cub/device/device_me...

link

janwas 571 days ago

I agree not everything needs to be rewritten. And neither does code using an abstraction such as Highway, so we can stop beating that dead horse.

link