|
|
|
|
|
by getnormality
476 days ago
|
|
The idea of minimizing complexity is less novel than it may seem. Regularization terms are commonly added to loss objectives in optimization, and these regularizers can often be interpreted as penalizing complexity. Duality allows us to interpret these objectives in multiple ways: 1. Minimize a weighted sum of data error and complexity. 2. Minimize the complexity, so long as the data error is kept below a limit. 3. Minimize the error on the data, so long as the complexity is kept below a limit. It does seem like classical regularization of this kind has been out of fashion lately. I don't think it plays much of a role in most Transformer architectures. It would be interesting if it makes some sort of comeback. Other than that, I think there are so many novel elements in this approach that it is hard to tell what is doing the work. Their neural architecture, for example, seems carefully hacked to maximize performance on ARC-AGI type tasks. It's hard to see how it generalizes beyond. |
|