|
|
|
|
|
by ckitching
698 days ago
|
|
[I work on SCALE] Mapping inline ptx to AMD machine code would indeed suck. Converting it to LLVM IR right at the start of compilation (when the initial IR is being generated) is much simpler, since it then gets "compiled forward" with the rest of the code. It's as if you wrote C++/intrinsics/whatever instead. Note that nvcc accepts a different dialect of C++ from clang (and hence hipcc), so there is in fact more that separates CUDA from hip (at the language level) than just find/replace. We discuss this a little in [the manual](https://docs.scale-lang.com/manual/dialects/) Handling differences between the atomic models is, indeed, "fun". But since CUDA is a programming language with documented semantics for its memory consistency (and so is PTX) it is entirely possible to arrange for the compiler to "play by NVIDIA's rules". |
|
I believe nvcc is roughly an antique clang build hacked out of all recognition. I remember it rejecting templates with 'I' as the type name and working when changing to 'T', nonsense like that. The HIP language probably corresponds pretty closely to clang's cuda implementation in terms of semantics (a lot of the control flow in clang treats them identically), but I don't believe an exact match to nvcc was considered particularly necessary for the clang -x cuda work.
The ptx to llvm IR approach is clever. I think upstream would be game for that, feel free to tag me on reviews if you want to get that divergence out of your local codebase.