|
|
|
|
|
by michaelt
950 days ago
|
|
The vast majority of work in ML isn't people working with CUDA directly - people use open source frameworks like PyTorch and TensorFlow to define a network and train it, and all the frameworks support CUDA as a backend. Other backends are also available, such as CPU-only training. And you can export networks in reasonably-standard formats. nvidia's moat is much more mature framework support than AMD's cards; widespread popularity due to that good framework support, ensuring everyone develops on nvidia, thus maintaining their support lead; much faster performance than CPU-only training; and a price that, though high, is a lot less than an ML developer's salary. If you need 24GB of vram and nvidia offers that for $1600 while AMD offers it for $1300, how many compatibility problems do you want to deal with to save a single day's wages? But nvidia's moat is far from guaranteed. Huge users like OpenAI and Facebook might find improving AMD support pays for itself. |
|
At that scale they may actually develop their own hardware a la Google TPU.
If you want to just focus on the AI problem and not on infrastructure, just use NVidia. If you want control and efficiency, design your own. AMD kind of falls in a weird middle ground with respect to the massive companies.