|
|
|
|
|
by kouteiheika
168 days ago
|
|
> What does it mean that only 3B parameters are active at a time? In a nutshell: LLMs generate tokens one at a time. "only 3B parameters active a a time" means that for each of those tokens only 3B parameters need to be fetched from memory, instead of all of them (30B). |
|