|
|
|
|
|
by YeGoblynQueenne
1095 days ago
|
|
Your model is not sorting correctly and it sure has not learned any "algorithm". At best it has learned to approximate a sorting algorithm. That's what statistical machine learning models do, they are function approximators; not program learners. Also, Machine Learning 101: you test your models on a test set that is disjoint to the training set. To clarify, we do this not because it's in the book and that's the rules, but because, by testing the model on held-out data, we can predict the error the model will have on unseen data (i.e. data not available to the experimenter). And we do this because under PAC-Learning assumptions a learner is said to learn a concept when it can correctly label instances of the concept with some probability of some error. In real-world situations we do not know the true concept, so we test on held-out data to approximate the probability of error. Bottom line, if you train a model to do a thing and you don't test it carefully to figure out its error, you might claim it's learned something, but in truth, you have no idea what it's learned. (To clarify: you tested on the train data assuming there's a low probability of overlap. Don't do that if you're trying to understand what your models can do). |
|
Transformers are RASP programs, which includes sorting programs. See the Weiss paper (https://arxiv.org/pdf/2106.06981.pdf).
> Also, Machine Learning 101: you test your models on a test set that is disjoint to the training set. To clarify, we do this not because it's in the book and that's the rules, but because, by testing the model on held-out data, we can predict the error the model will have on unseen data
The probability of a test list existing in the training set is less than 10^-70.