|
|
|
|
|
by ossa-ma
173 days ago
|
|
You're right but my understanding is that Groq's LPU architecture makes it inference-only in practice. Like Groq's chips only have 230MB of SRAM per chip vs 80GB on an H100, training is memory hungry as you need to hold model weights + gradients + optimizer states + intermediate activations. |
|