|
|
|
|
|
by HarHarVeryFunny
792 days ago
|
|
OP's claim/observation is that "trained on the same dataset for long enough, pretty much every model with enough weights and training time converges to the same point [of inference performance]". His conclusion is that "It implies that model behavior is not determined by architecture, hyperparameters, or optimizer choices. It’s determined by your dataset, nothing else". There is an implicit assumption here that seems obviously false - that this "convergence point" of predictive performance represents the best that can be done with the data, which is to imply that these current models are perfectly modelling the generative process - the human brain. This seems highly unlikely. If they are perfectly modelling the human brain, then why do they fail so badly at so many tasks? Just lack of training data? |
|
You and me can model the dataset better, but we're already "pre-trained" on reality for decades.
Just because the dataset is large doesn't mean it contains useful information.