Hacker News new | ask | show | jobs
by lucidrains 615 days ago
does this not mean we should explore usage of talking heads (Shazeer et al) a bit more? https://arxiv.org/abs/2003.02436