Hacker News new | ask | show | jobs
by orf 634 days ago
Having a common library function that is either lighting fast or dog slow depending on the hardware, is not a great position to be in.

Moreover, this will get worse as more CUDA specific features are added to PyTorch with ad-hoc fallback functions.

I guess OP is saying that XLA is more transparent in this regard, because it wouldn’t use functions like these and the generated comparable code would be on-pare performance wise?

1 comments

> it wouldn’t use functions like these and the generated comparable code would be on-pare performance wise

Perhaps if XLA generated all functions from scratch, this would be more compelling. But XLA relies very heavily on pattern-matching to common library functions (e.g. CuDNN), and these patterns will certainly work better on Nvidia GPUs than AMD GPUs.

In this way, I actually think explicitly calling the common library functions is actually much more transparent.