Hacker News new | ask | show | jobs
by safarimonkey 1098 days ago
Some of the techniques improve over linear scaling of the baseline models. For example, from the article:

> Conditional computation avoids applying all model parameters to all tokens from the input sequence. [CoLT5] applies heavy computations only to the most important tokens and processes the rest of the tokens with a lighter version of layers. It will speed up both training and inference.

[CoLT5]: https://arxiv.org/abs/2303.094752

1 comments

Thanks, that's interesting.