| HN Mirror

Sorry. You do mention a linear systems response, and that's what I meant.

In that setting, the eigenvectors work as a generalized forward and inverse fourier transform, and the eigenvalues form the transfer function you allude to in the bold sentence

"The attention mechanism’s role is the same as that of a transfer function in a linear time-invariant system, namely it calculates the frequency response of the transformer model,"

Specifically, it seems to me that this requires a _symmetric_ attention matrix. Which you get from the self-attention mechanisms (two of the three places where they're used in transformers), but not all of them, notably not the one that combines the output of the first two attention mechanisms (one input, and one output)