Hacker News new | ask | show | jobs
by rishabhaiover 64 days ago
After a certain amount of context usage, I think I empirically see the stated issues with Top-K compression strategy. It doesn't catastrophically forget but nuances fade as I reach towards the tail end of my context limits.
1 comments

Yeah, that’s consistent. topK keeps the obvious tokens, but subtle context gets eroded over time rather than dropped all at once.