|
|
|
|
|
by red2awn
189 days ago
|
|
Correct, it's breaks the single prompt, single completion assumption baked into the frameworks. Conceptually it's still prompt/completion but for low latency response you have to do streaming KV cache prefill with a websocket server. |
|