| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by just_a_quack 1130 days ago
	There's not. The positional encodings are generated using sines and cosines such that any offset in position can be described as a linear function on the original position. Using the DFT here would not make sense as the positional encodings are fixed anyway and during inference this method generalizes nicely because of the geometric progression created by the arguments of the positional encoding functions.

1 comments

chaxor 1128 days ago

There isn't a DFT directly, it's a more obvious statement here. The circulant matrix (linear graph of words) always has the same eigenvectors and is diagonalized via DFT.

The PE in original Viswani is based on this, they just didn't put in all the details. So effectively the model gets hints from the PE that it's a linear graph because these are the eigenvectors.

link