|
|
|
|
|
by redox99
1216 days ago
|
|
> True, but this entire loop happens within the model No, it's literally a for loop in python that runs the whole thing from scratch[1] after appending each new token. No artificial slowdowns, what you're seeing is literally what it's spitting out in real time Here's an example https://github.com/karpathy/minGPT/blob/master/mingpt/model.... [1] Some things can be cached such as kv, but still |
|