Hacker News new | ask | show | jobs
by enjeyw 159 days ago
One of the big problems with Attention Mechanisms is that the Query needs to look over every single key, which for long contexts becomes very expensive.

A little side project I've been working on is to train a model that sits on top of the LLM, looks at each key and determines whether it's needed after a certain lifespan, and evicts it if possible (after the lifespan is expired). Still working on it, but my first pass test has a reduction of 90% of the keys!

https://github.com/enjeyw/smartkv

1 comments

Is this not similar to DeepSeek lighting indexer