Hacker News new | ask | show | jobs
by redox99 1216 days ago
> True, but this entire loop happens within the model

No, it's literally a for loop in python that runs the whole thing from scratch[1] after appending each new token. No artificial slowdowns, what you're seeing is literally what it's spitting out in real time

Here's an example

https://github.com/karpathy/minGPT/blob/master/mingpt/model....

[1] Some things can be cached such as kv, but still