|
|
|
|
|
by 9q9
2254 days ago
|
|
Thanks. How to abstract "hardware primitives" in a way that can be instantiated to many GPU architectures without performance penalty and be useful for higher-level programming is not so clear. How would you, to take an example from the CPU world, fruitfully abstract the CPU's memory model? As far as I'm aware that's not a solved problem in April 2020, and write papers on this subject are still appearing in top conferences . |
|
Unfortunately, this language is from 2007, and the state-of-the art has not really been improved since then (except for the evolution of CUDA proper).
It would be cool for people to work on improving the state of the art, providing languages that are simpler, perform better, or are more high-level than CUDA, without sacrificing anything (i.e. true improvements).
There have been some notable experiments in this regard, e.g., Sequoia for roadrunner scratchpads had very interesting abstraction capabilities over the cache hierarchy, that could have led to a net total improvement over CUDA __shared__ memory.
Most newer languages have the right goal of trying to simplify CUDA, but they end up picking trade-offs that only allow them to do so by sacrificing a lot of performance. That's not an interesting proposition for most CUDA developers - the reason they pick up CUDA is performance, and sacrificing ~5% might be acceptable if the productivity gains are there, but a 20-30% perf loss isn't acceptable - too much money involved.
One can simplify CUDA while retaining performance by restricting a language to GPUs and a particular domain, e.g., computer graphics / gfx shaders or even stencil codes or map-reduce, etc.
However, a lot of the widely advertised languages try to (1) target both GPUs and CPUs, compromising on some minimum common denominator of features, (2) support general-purpose compute kernels, and (3) significantly simplify CUDA, often doing so by just removing the explicit memory transfers, which compromises performance. These languages are quite neat, but they aren't really practical, because they aren't really true improvements over CUDA.
Most companies I work with that use CUDA today are using the state of the art (C++17 or 20 extensions), experimental libraries, drivers, etc. So it isn't hard to sell them into a new technology, _if it is better than what they are already using_.