|
I think this bitter lesson needs to be taken for a several grains of salt. Number one, the progress in a particular AI field tends to go, at first, from custom to more general algorithms, exactly as Professor Richard Sutton described. However, there is a second part to this progress, where, once we "understood" (which we never really do) the new level of general algorithms (say Transformers in NLP), we begin to put back in the all the things we learned before (say, from Linguistics experience, we put the bias towards compositionality and corresponding tree structures back into the Transformers). Number two, the computationally scalable algorithms always win in the environments where you have unlimited access to the computation and the data, i.e. if you working for Google, Facebook, Alibaba, etc... In other companies, you have limited computational budget and limited data. You could end up putting back-in a lot of sophisticated inductive biases back into your DL algorithms. |
A counter example would be showing a number of successful examples in, say, computer vision, where handcrafted features do better than learned features. This is largely not the case. In, say, both NLP and Computer Vision, learned features dominate, even at companies with less compute (they use pretrained models).
(Disclaimer: I work with Rich.)