Hacker News new | ask | show | jobs
by thunderbird120 18 days ago
Yes, since the weights being updated are a small subset of the overall total it's manageable. Just like how each separate conversation currently requires you to store a separate KV cache, you'd need to store the fast weights separately. Both KV cache and fast weight content stores have to be conversation specific, so just setting a bit of extra RAM aside for "memory" isn't really a new ask, just a different format for an old problem.