| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cjbprime 842 days ago
	I think the answer's in the original question: the provider has to pay for extra storage to cache the model state at the prompt you're asking to snapshot. But it's not necessarily a net increase in costs for the provider, because in exchange for doing so they (as well as you) are getting to avoid many expensive inference rounds.

1 comments

Isn't the expensive part keeping the tokenized input in memory?