|
|
|
|
|
by jfumero
2294 days ago
|
|
Totally agree. OpenCL code is portable, but performance is not. That's why TornadoVM specializes the OpenCL code depending on the target device. For FPGAs we do a lot more optimizations compared to GPUs, such as tuning the thread-scheduling, better loop unrolling and loop flattening, use of local memory, etc. All of these optimizations are automatically performed in the compiler-IR (GraalIR) before generating the actual OpenCL C code. With those compiler specializations, we aim to close the performance gap between hand-tuned code and generated code. |
|