| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by safarimonkey 1098 days ago

Some of the techniques improve over linear scaling of the baseline models. For example, from the article:

> Conditional computation avoids applying all model parameters to all tokens from the input sequence. [CoLT5] applies heavy computations only to the most important tokens and processes the rest of the tokens with a lighter version of layers. It will speed up both training and inference.

[CoLT5]: https://arxiv.org/abs/2303.094752

1 comments

gdiamos 1098 days ago

Thanks, that's interesting.

link