|
|
|
|
|
by gatienboquet
478 days ago
|
|
LLMs are primarily "memory-bound" rather than "compute-bound" during normal use. The model weights (billions of parameters) must be loaded into memory before you can use them. Think of it like this: Even with a very fast chef (powerful CPU/GPU), if your kitchen counter (VRAM) is too small to lay out all the ingredients, cooking becomes inefficient or impossible. Processing power still matters for speed once everything fits in memory, but it's secondary to having enough VRAM in the first place. |
|
My guess is that these chips could be compute-bound though given how little compute capacity they have.