|
|
|
|
|
by modeless
248 days ago
|
|
The models (weights and activations and caches) can fill all the memory you have and more, and to a first (very rough) approximation every byte needs to be accessed for each token generated. You can see how that would add up. I highly recommend Andrej Karpathy's videos if you want to learn details. |
|