Hacker News new | ask | show | jobs
by lxgr 26 days ago
That said, the KV cache is very much not stateless, so internally inference APIs will be highly incentivized to route requests to instances with as much a shared prefix cached as possible.