|
|
|
|
|
by sigmoid10
813 days ago
|
|
>On GPUs, ML "just works" If you had worked with ML, you'd know that this is not true. It's actually more like the opposite. It also has nothing to do with the chips themselves. Things don't magically work "because GPU", they work because manufacturers spend the time getting their drivers and ecosystems right. That's why for example noone is using AMD GPUs for ML, despite them offering more compute per dollar on paper. Getting the software stack to the point of Nvidia/CUDA, where things really do "just work", is an enormous undertaking. And as someone who has been researching ML for more than a decade now, I can tell you Nvidia also didn't get these things right in the beginning. That's the reason why they have no real competition today (and still won't for quite some time). |
|
You're right, they are behind, but to say that nobody is using it, is not truthful. AMD HPC clusters are being used [0] and [1] for AI/ML.
The larger issue is that AMD has only been building HPC clusters for the last period of time. Now, with the release of MI300x, we have Azure and Oracle coming online with them now. Disclosure, my business is also building a MI300x super computer as well, with the express goal of enabling more access to developers.
[0] https://defensescoop.com/2023/08/23/navys-new-25m-supercompu...
[1] https://arxiv.org/abs/2312.12705