|
|
|
|
|
by dragontamer
572 days ago
|
|
Cub, thrust and many other libraries that make those kernels possible don't need to be rewritten. When you write a merge sort in CUDA, you can keep it across all versions. Maybe the new instructions can improve a few corner cases, but it's not like AVX to AVX512 where you need to rewrite everything. Ex: https://github.com/NVIDIA/cub/blob/main/cub/device/device_me... |
|