Hacker News new | ask | show | jobs
by immibis 710 days ago
Transformers reduce the number of relationships between tokens that must be learned, too. An MLP has to separately learn all possible relationships between token 1 and 2, and 2 and 3, and 3 and 4. A transformer can learn relationships between specific values regardless of position.