Y
Hacker News
new
|
ask
|
show
|
jobs
by
7to2
1191 days ago
Do you know what the WPEs are for llama?
1 comments
sebzim4500
1191 days ago
It doesn't really use them, it uses something called RoPE which is hardcoded rather than learned and is applied multiplicatively at every layer to both the key and the value.
https://arxiv.org/abs/2104.09864
link
https://arxiv.org/abs/2104.09864