|
|
|
|
|
by williamtrask
1596 days ago
|
|
Poor performance is more likely due to how transformer neural networks view numbers. It memorises them like words instead of modeling their numerical structure. Thus even if it’s seen the number 3456 and 3458, it knows nothing of 3457. Totally different embedding. It’s like a kid memorising a multiplication table instead of learning the more general principle of multiplication (related: this illusion is why big models are so popular. Memorise more stuff.) Paper (NeurIPS/DeepMind): https://arxiv.org/abs/1808.00508 |
|
Deep Symbolic Regression for Recurrent Sequences https://arxiv.org/abs/2201.04600
If you look at embedding visualization it is very clear that the model learns order of numbers.
(Interactive demo: http://recur-env.eba-rm3fchmn.us-east-2.elasticbeanstalk.com... )
There is also:
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets https://arxiv.org/abs/2201.02177
Again, looking at visualizations the model very clearly grasps the structure of the function it models.