|
|
|
|
|
by famouswaffles
1134 days ago
|
|
The bitter lesson isn't really "algorithms bad", "don't try different approaches", "don't innovate" or "only work on models with massive compute". The heart of the bitter lesson is "don't try to codify "insight" into the process". It's basically the age old "you don't know what you don't know". The Transformer is kind of a perfect example. It boasts algorithmic improvements over RNNs and LLMs are by far the best performing take on language modelling ever. And yet the architecture itself has basically no breakthrough from understanding language itself. It's an improvement over standard RNNs but not really because of any new found insight or implementation on language itself. Basically trying to cram human high level instincts/insights into the process of solving a problem doesn't work better than giving a general architecture tons of data and letting it figure that all out by itself. |
|
This is exactly right and what a lot of people get wrong. Sutton isn't saying that you can't have constraints in your network either. He also isn't saying "no need to learn math", which is a far too common interpretation I've seen. It isn't just data and scale, algorithms are critical too. Just don't force aspects like Gabor filters, symmetry, etc. This doesn't mean works like geometric deep learning are dead (alpha fold even uses it!). The reason to not force insights is because they sometimes don't hold in high dimensions and sometimes our assumptions are wrong. It can also limit the path to reach the optimal/desired solution even if the optimal solution has those constraints. But I am specifically saying "force" because we can hint and we are always using some human insight.