| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by haffi112 974 days ago
	Isn't that going to be a problem with no NVIDIA GPUs?

3 comments

mikk14 974 days ago

Funnily enough I literally just today launched a big job on LUMI that I have started also on another smaller cluster with nVidia GPUs. Basically, I'm running Llama2-70B to do some zero-shot text classification. The nVidia setup uses 4 A100s, while on LUMI I could access 6 MI250Xs.

It is, unfortunately, not an apples-to-apples comparison, because on the nVidia cluster I'm running it via llama-cpp-python and a quantized 34B version, while on LUMI I'm running the official non-quantized full 70B version via the transformers library.

Long story short, I'm getting a 7.5x higher throughput from LUMI than on the nVidia cluster (which means each card is 5x faster on LUMI).

Edit: The AMD GPUs work fine because one can run Pytorch for ROCm via the pytorch-triton-rocm package.

link

haffi112 973 days ago

Thanks, that's great to know. Do you know how they would compare if you performed training instead of inference?

link

mikk14 973 days ago

I haven't tried it unfortunately, and I don't really have data to make an educated guess. I have played a little bit with some training and it seemed a bit slow, but the environment for testing is not really representative of the speed for submitted jobs -- even my inference in the testing environment was pretty slow, but once submitted the runtimes were very different.

link

thiago_fm 974 days ago

You can run LLMs in your own machine. Do you think a super computer would have issues? CUDA has optimizations, but you don't necessarily need it to do inference at all.

Those super computers are extremely powerful, it might not be as energy efficient as H100s, but it does the job.

link

0xDEF 974 days ago

There has been a lot of recent progress in making PyTorch AMD compatible exactly because many government/university supercomputers are based on AMD GPUs.

link