|
|
|
|
|
by juliangoldsmith
974 days ago
|
|
I found this video helpful for understanding transformers in general, but it covers attention too: https://www.youtube.com/watch?v=kWLed8o5M2Y The short version (as I understand it) is that you use a neural network to weight pairs of inputs by their importance to each other. That lets you get rid of unimportant information while keeping what actually is important. |
|