Hacker News new | ask | show | jobs
by jfumero 2294 days ago
Totally agree. OpenCL code is portable, but performance is not. That's why TornadoVM specializes the OpenCL code depending on the target device. For FPGAs we do a lot more optimizations compared to GPUs, such as tuning the thread-scheduling, better loop unrolling and loop flattening, use of local memory, etc. All of these optimizations are automatically performed in the compiler-IR (GraalIR) before generating the actual OpenCL C code.

With those compiler specializations, we aim to close the performance gap between hand-tuned code and generated code.