|
|
|
|
|
by tarvaina
1103 days ago
|
|
The ingredients you need for training a useful machine learning model are expressivity, learnability, and generalization. Many methods are universal approximators but that only takes care of the first ingredient. Arguably the reason neural networks are so successful is that they can offer a good balance between the three. Before transformers we built different neural network architectures for each domain. These architectures offered better inductive biases for their respective domains and thus traded off some of the expressivity for better learnability and generalization. Nowadays the best architectures seem to be merging towards transformers. They appear to offer more generally useful inductive biases and thus a better trade-off between the three ingredients than the earlier architectures. |
|