Hacker News new | ask | show | jobs
by spenczar5 862 days ago
We are so, so, so far away from compilers that could automatically help you, say, rewrite an operation to achieve high warp occupancy. These are not trivial performance optimizations - sometimes the algorithm itself fundamentally changes when you target the CUDA runtime, because of complexities in the scheduler and memory subsystems.

I think there is no way that you will see compilers that advanced within 3 years, sadly.