Hacker News new | ask | show | jobs
by vlovich123 450 days ago
Ah OK. So this is for resuming chat context cheaply. What I said is still correct - 3FS is not part of the inference flow & not relevant to the paper which is about optimizing the KV cache usage at runtime.