Hacker News new | ask | show | jobs
by igorkraw 1670 days ago
Your last point is true only with positional encodings though, attention itself is a permutation equivariant function