Hacker News new | ask | show | jobs
by baruch 39 days ago
It is possible to get more tokens out of the same hardware by leveraging fast storage for KVCache, it is especially useful for agentic workloads.