Hacker News new | ask | show | jobs
by wills_forward 1063 days ago
MIT and Microsoft Researchers Introduce "RetNet" - An 8X Faster Transformer Alternative for AI
2 comments

The claim is parallelism for training which is not fixed speed up, different complexity for inference (constant time), and different complexity for large context inference (linear) - so nothing that can be summarised as 8x - or am I getting this summary wrong?
the words per second i believe from the first graph in the paper
What’s the MIT connection? The authors all seem to be affiliated with Tsinghua.