Y
Hacker News
new
|
ask
|
show
|
jobs
by
lucidrains
615 days ago
does this not mean we should explore usage of talking heads (Shazeer et al) a bit more?
https://arxiv.org/abs/2003.02436