|
|
|
|
|
by shpongled
310 days ago
|
|
I looked through their torch implementation and noticed that they are applying RoPE to both query and key matrices in every layer of the transformer - is this standard? I thought positional encodings were usually just added once at the first layer |
|