Hacker News new | ask | show | jobs
by i_am_proteus 1485 days ago
While I am inclined to personally agree with your sentiment, I don't think I have better insights than Richard Sutton: http://incompleteideas.net/IncIdeas/BitterLesson.html

"The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin."

1 comments

There's a great motivation for small model work for big-model results: More efficient use of compute can be leveraged to make big models effectively bigger. Small-model architectural innovations are computational leverage. You can even see the convolution operation in this light; it's much more efficient than the 'giant dense matrix' approach.

EfficientNet is an exemplar of this approach; they made much better small models, and wound up with much higher quality big models as a result of having better architecture overall: https://arxiv.org/pdf/1905.11946.pdf

We're currently seeing some great results with more efficient attention layers, which will make the current 'big' models much more efficient... And unlock a next generation of higher quality big models.