|
|
|
|
|
by wujingyue
3704 days ago
|
|
Thanks for your interest, and hope you like it! Yes, it is currently incomplete, but I'd say at least 80% of the optimizations are upstreamed already. Also, folks in the LLVM community are actively working on that. For example, Justin Lebar recently pushed http://reviews.llvm.org/D18626 that added the speculative execution pass to -O3. Regarding performance, one thing worth noting is that missing one optimization does not necessarily cause significant slowdown on the benchmarks you care about. For example, the memory-space alias analysis only noticeably affects one benchmark in the Rodinia benchmark suite. Regarding your second question, the short answer is no. The Clang/LLVM version uses a different architecture (as mentioned in http://wujingyue.com/docs/gpucc-talk.pdf) from the internal version. The LLVM version offers better functionality and compilation time, and is much easier to maintain and improve in the future. It would cost even more effort to upstream the internal version than to make all optimizations work with the new architecture. |
|
I don't have a lot of benchmarks at the moment, so I can't say how important they are. And it of course depends on what you're doing.
clang/llvm's CUDA implementation shares most of the backend with gpucc, but it's an entirely new front-end. The front-end works for tensorflow, eigen, and thrust, but I suspect if you try hard enough you'll be able to find something nvcc accepts that we can't compile. At the moment we're pretty focused on making it work well for Tensorflow.