Hacker News new | ask | show | jobs
by sadhorse 815 days ago
Does every token requires a full model computation?
1 comments

No, you can cache some of the work you did when processing the previous tokens. This is one of the key optimization ideas designed into the architecture.