|
|
|
|
|
by _heimdall
340 days ago
|
|
It that just avoids having to send the full context for follow-up requests, right? My understanding is that caching helps to keep the context around but can't avoid the need to process that context over and over during inference. |
|