Y
Hacker News
new
|
ask
|
show
|
jobs
by
lxgr
26 days ago
That said, the KV cache is very much not stateless, so internally inference APIs will be highly incentivized to route requests to instances with as much a shared prefix cached as possible.