|
|
|
|
|
by vineethy
173 days ago
|
|
I think it's important to note that there's nothing forbidding LPU style determinism from being used in training. They just didn't make that choice. Also tenstorrent could be a viable challenger in this space. It seems to me that their NoC and their chips could be mostly deterministic as long as you don't start adding in branches |
|
Like Groq's chips only have 230MB of SRAM per chip vs 80GB on an H100, training is memory hungry as you need to hold model weights + gradients + optimizer states + intermediate activations.