|
|
|
|
|
by 1980phipsi
199 days ago
|
|
> It is also important to note that, until recently, the GenAI industry’s focus has largely been on training workloads. In training workloads, CUDA is very important, but when it comes to inference, even reasoning inference, CUDA is not that important, so the chances of expanding the TPU footprint in inference are much higher than those in training (although TPUs do really well in training as well – Gemini 3 the prime example). Does anyone have a sense of why CUDA is more important for training than inference? |
|
Once you have trained, you have frozen weights/feed-forward networks that consist out of frozen weights that you can just program in and run data over. These weights can be duplicated across any amount of devices and just sit there and run inference with new data.
If this turns out to be the future use-case for NNs(it is today), then Google are better set.