|
|
|
|
|
by djsjajah
185 days ago
|
|
> Do you really though? Yes. It stays in on the hbm but it need to get shuffled to the place where it can actually do the computation. It’s a lot like a normal cpu. The cpu can’t do anything with data in the system memory, it has to be loaded into a cpu register.
For every token that is generated, a dense llm has to read every parameter in the model. |
|