Hacker News new | ask | show | jobs
by srush 968 days ago
The model targets the decoder part of the system which is the speed bottleneck. So for tasks like classification it is not likely to be helpful. However a similar method could be used for that use case. (Coauthor)