Y
Hacker News
new
|
ask
|
show
|
jobs
user:
thw20
created:
2024-04-24
karma:
1
submissions:
Simple, zero overhead way to compress model, KV cache via Low-Rank Decomposition
1 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
Towards understanding multiple attention sinks in LLMs
1 points
|
2 comments
0 points
|
0 comments
The Existence and Behavior of Secondary Attention Sinks
1 points
|
0 comments