Y
Hacker News
new
|
ask
|
show
|
jobs
by
npsomaratna
1056 days ago
My understanding is that in NTK aware RoPE scaling, the model does pay uniform attention. With older methods, not as much.