|
|
|
|
|
by enjeyw
159 days ago
|
|
One of the big problems with Attention Mechanisms is that the Query needs to look over every single key, which for long contexts becomes very expensive. A little side project I've been working on is to train a model that sits on top of the LLM, looks at each key and determines whether it's needed after a certain lifespan, and evicts it if possible (after the lifespan is expired). Still working on it, but my first pass test has a reduction of 90% of the keys! https://github.com/enjeyw/smartkv |
|