|
|
|
|
|
by PresidentZippy
837 days ago
|
|
Is there anything more to it than that? If not, the documentation would be a lot more helpful if it lead with something straight to the point. Here's something that could go directly under the title of the README: "This compiler consists of a custom LLVM frontend and backend. The backend compiles LLVM IR code into machine code consisting of x86_64 instructions and Intel GPU code. The frontend works in conjunction with the backend to compile C and C++ code with special optimization which when enabled, compiles OpenMP routines into hardware-accelerated code targeting Intel GPUs, FPGAs, or AMD and NVIDIA GPUs." As someone who only used OpenMP academically, I don't see much of a point in that. In the post C++11 world, where we can write type-safe compile-time code, preprocessor macro definitions should stay in C code. Until Intel GPUs are at least competitive with the big boys, interop with their products doesn't concern me a whole hell of a lot. I'm not going to plan my scientific computing applications around the integrated graphics found on cheap Wintel consumer devices. |
|
It's worth noting that OpenMP pragmas are a totally different thing to C preprocessor macros. A pragma like 'omp target parallel for' means something like "take the following loop, build a GPU kernel out of it, arrange for data to be copied back and forth and to launch that kernel when control flow gets here, and arrange to link in all the openmp libraries and also run a bunch of compiler optimisations". A macro means "replace these tokens with these other ones".
OpenMP is essentially a really big runtime library dealing with threads, scheduling execution, running code on GPUs and so forth. It is sort-of usable in that form. If you're determined then making calls directly into libomp.so and libomptarget.so will make your will a reality. All the pragma syntax is about transforming application code into a lot of calls into that library with appropriately constructed tables of data. And then the compiler works hard to optimise this, e.g. removing calls that don't need to happen, simplifying others, deduplicating some.
Syntactically OpenMP is a really good fit for Fortran. The invocations look completely appropriate there. For C++, it does tend to upset the sensibilities of the programmers. I personally think it's wildly funny for people who are content with the syntactic horror show of C++ to decide the OpenMP extensions are ugly but there we go, normalisation of deviance and all that.
On a more philosophical level, and what drew me to implementing OpenMP originally, CUDA is a problem. Not only in the vendor lock to nvidia sense - it's also a deeply nasty language to program with. I especially dislike the warp intrinsics - they take a bitmap corresponding to the CFG of your program, which you are supposed to compute manually (across branches, loops and so forth) and pass around into library functions. GPUs are excellent machines and I want to be able to program them in something which is not CUDA.