Hacker News new | ask | show | jobs
by imh 3504 days ago
One other cool part of attention is that you can attend to m-dimensional parts of a n-by-m matrix just as well as a k-by-m matrix. Objects (sentences) of varying size can be treated the same in a really nicely principled way.