Hacker News new | ask | show | jobs
user: thw20
created: 2024-04-24
karma: 1

submissions:

Simple, zero overhead way to compress model, KV cache via Low-Rank Decomposition
1 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
Towards understanding multiple attention sinks in LLMs
1 points | 2 comments
0 points | 0 comments
The Existence and Behavior of Secondary Attention Sinks
1 points | 0 comments