Y
Hacker News
new
|
ask
|
show
|
jobs
by
imh
3504 days ago
One other cool part of attention is that you can attend to m-dimensional parts of a n-by-m matrix just as well as a k-by-m matrix. Objects (sentences) of varying size can be treated the same in a really nicely principled way.