Hacker News new | ask | show | jobs
by gslaller 459 days ago
A noob speaking here. Why aren't there efforts to have a memory bank like structure where you attend to a sub set of codes depending on the key(at the attention level)? is this already done with the global attention mechanism (what is it even)?
1 comments

There are k v optimisations, unsure if gemma works with them, I didn't try.