|
|
|
|
|
by jlebar
3427 days ago
|
|
> Do you assume you always have CUDA sources for ML operations in XLA? I was under the impression that closed-source libraries like cuDNN were used. Yes, XLA calls into cudnn and cublas. It's not a fundamental architectural thing, though; those are just the fastest matmul etc. kernels we currently have access to. > Is it possible to accurately evaluate the profitability of fusing two kernels in CUDA (effects of increased register pressure; shared memory)? For a human, yes, sure, just time both options. The system doesn't currently do this in an automated fashion, though. In a fashion similar to a CPU compiler's inliner, it has heuristics and makes its best guess. In general fusion is very profitable. > On the other hand, the generic kernel and its launch parameters were probably hand tuned for performance. Yes, and this is one of ways that XLA can lose to (say) vanilla TensorFlow today. But it's just a matter of tuning; the system is very young. |
|