|
|
|
|
|
by wongarsu
618 days ago
|
|
One big thing that bells and whistles do is limit the training space. For example when CNNs took over computer vision that wasn't because they were doing something that dense networks couldn't do. It was because they removed a lot of edges that didn't really matter, allowing us to spend our training budget on deeper networks. Similarly transformers are great because they allow us to train gigantic networks somewhat efficiently. And this paper finds that if we make RNNs a lot faster to train they are actually pretty good. Training speed and efficiency remains the big bottleneck, not the actual expressiveness of the architecture |
|
Some links if interested:
[1] https://gpt3experiments.substack.com/p/understanding-neural-...
[2] https://gpt3experiments.substack.com/p/building-a-vector-dat...