Hacker News new | ask | show | jobs
by 7to2 1191 days ago
Do you know what the WPEs are for llama?
1 comments

It doesn't really use them, it uses something called RoPE which is hardcoded rather than learned and is applied multiplicatively at every layer to both the key and the value.

https://arxiv.org/abs/2104.09864