Hacker News new | ask | show | jobs
by cjbprime 842 days ago
I think the answer's in the original question: the provider has to pay for extra storage to cache the model state at the prompt you're asking to snapshot. But it's not necessarily a net increase in costs for the provider, because in exchange for doing so they (as well as you) are getting to avoid many expensive inference rounds.
1 comments

Isn't the expensive part keeping the tokenized input in memory?