|
|
|
|
|
by just_a_quack
1130 days ago
|
|
There's not. The positional encodings are generated using sines and cosines such that any offset in position can be described as a linear function on the original position. Using the DFT here would not make sense as the positional encodings are fixed anyway and during inference this method generalizes nicely because of the geometric progression created by the arguments of the positional encoding functions. |
|
The PE in original Viswani is based on this, they just didn't put in all the details. So effectively the model gets hints from the PE that it's a linear graph because these are the eigenvectors.