Hacker News new | ask | show | jobs
by htirwklj4523432 1039 days ago
The numbers look amazing.

Can you comment on how difficult it was to achieve this, and what the relative advantages b/w cards ? AFAIR, AMD cards were not not deemed competitive with Nvidia in DL space largely because of the amazing job Nvidia pulled off with CUDNN and its conv. kernels.

LLMs etc. OTOH doesn't really depend on convolutions (atleast the pure transformer bits), and instead depends a lot more on plain old GEMM + low-bit float/int compute.

1 comments

> Can you comment on how difficult it was to achieve this, and what the relative advantages b/w cards?

Thanks for asking! I personally believe TVM Unity is a proper software stack for ML compilation (MLC), and its existing optimizations (e.g. TensorCore offloading) can be transparently transferred to AMD/Intel/Apple/mobile GPUs without too much engineering effort.

Of course my claim is limited to ML workloads. Not an expert outside the ML world, so I couldn't say for general HPC.