|
|
|
|
|
by geysersam
783 days ago
|
|
Seems like an objection that is slightly beside the point? The claim is not that literally any model gives the same result as a large transformer model, that's obviously false. I think the more generous interpretation of the claim is that the model architecture is relatively unimportant as long as the model is fundamentally capable of representing the functions you need it to represent in order to fit the data. |
|
His conclusion is that "It implies that model behavior is not determined by architecture, hyperparameters, or optimizer choices. It’s determined by your dataset, nothing else".
There is an implicit assumption here that seems obviously false - that this "convergence point" of predictive performance represents the best that can be done with the data, which is to imply that these current models are perfectly modelling the generative process - the human brain.
This seems highly unlikely. If they are perfectly modelling the human brain, then why do they fail so badly at so many tasks? Just lack of training data?