| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by qeternity 113 days ago
	> With prompt caching, verbose context that gets reused is basically free. But it's not. It might be discounted cost-wise, however it will still degrade attention and make generation slower/more computationally expensive even if you have a long prefix you can reuse during prefill.