|
|
|
|
|
by frogblast
258 days ago
|
|
Often not elusive bugs, but elusive performance. GPU compilers are hard: Once you've done the basics, trying to do further transforms in a mature compiler will almost always produced mixed results. Some kernels will go faster, some will go slower, and you're hoping to move the balance and not hit any critical kernel too hard in your efforts to make another go faster. An optimization with a universal >=0 speedup across your entire suite of tests is a really hard thing to come by. Something is always going to have a negative speedup. My experience is with non-Nvidia GPU systems, but this feels like a familiar situation. They probably found something that has great outcomes for one set of kernels, terrible outcomes for another, and no known reliable heuristic or modeling they could use to automatically choose. |
|