Hacker News new | ask | show | jobs
by verdverm 513 days ago
This one might be right if they have in fact unified multiple attention approaches into a single framework

see Section 3.4