Hacker News new | ask | show | jobs
by EricLeer 1216 days ago
True, but this entire loop happens within the model, if you would return an output at every intermediate step the model would be extremely slow.

My take on why they have build the output layer like it is, is that next to feeling more human, it also forces you to be a bit more thoughtfull with your requests, and thus spam the system less. In the end it is still really expensive to run these models..

1 comments

> True, but this entire loop happens within the model

No, it's literally a for loop in python that runs the whole thing from scratch[1] after appending each new token. No artificial slowdowns, what you're seeing is literally what it's spitting out in real time

Here's an example

https://github.com/karpathy/minGPT/blob/master/mingpt/model....

[1] Some things can be cached such as kv, but still