| HN Mirror

if you train a model on modular arithmatic, it can only learn what's in the training data. if all of the examples are of the form a + b mod 10, it isn't likely to generalize to be able to solve a + b mod 12. a human can learn the rule and figure it out. a model can't that's why a diverse training set is so important. it's possible to train a model to aproximate any function, but whether the approximation is accurate outside of the datapoints you trained on is not reliable, as far as I understand.

different hyperparameters can give a model that us over or underfit, but this helps the model interpolate, not generalize. it can know all the answers similar to the training data, not answers different to or it