| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jfumero 2294 days ago
	Totally agree. OpenCL code is portable, but performance is not. That's why TornadoVM specializes the OpenCL code depending on the target device. For FPGAs we do a lot more optimizations compared to GPUs, such as tuning the thread-scheduling, better loop unrolling and loop flattening, use of local memory, etc. All of these optimizations are automatically performed in the compiler-IR (GraalIR) before generating the actual OpenCL C code. With those compiler specializations, we aim to close the performance gap between hand-tuned code and generated code.