|
|
|
|
|
by qeternity
113 days ago
|
|
> With prompt caching, verbose context that gets reused is basically free. But it's not. It might be discounted cost-wise, however it will still degrade attention and make generation slower/more computationally expensive even if you have a long prefix you can reuse during prefill. |
|