Hacker News new | ask | show | jobs
by phkahler 1482 days ago
It does seem like better algorithms to get similar results from smaller models should be prioritised.

Rather than throwing more compute at a problem for 0.03 better score, show me one tenth the compute with a loss of 0.03 score. That would be impressive and far more useful.

1 comments

While I am inclined to personally agree with your sentiment, I don't think I have better insights than Richard Sutton: http://incompleteideas.net/IncIdeas/BitterLesson.html

"The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin."

There's a great motivation for small model work for big-model results: More efficient use of compute can be leveraged to make big models effectively bigger. Small-model architectural innovations are computational leverage. You can even see the convolution operation in this light; it's much more efficient than the 'giant dense matrix' approach.

EfficientNet is an exemplar of this approach; they made much better small models, and wound up with much higher quality big models as a result of having better architecture overall: https://arxiv.org/pdf/1905.11946.pdf

We're currently seeing some great results with more efficient attention layers, which will make the current 'big' models much more efficient... And unlock a next generation of higher quality big models.