| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Isinlor 1596 days ago

Take a look at this paper:

Deep Symbolic Regression for Recurrent Sequences https://arxiv.org/abs/2201.04600

If you look at embedding visualization it is very clear that the model learns order of numbers.

(Interactive demo: http://recur-env.eba-rm3fchmn.us-east-2.elasticbeanstalk.com... )

There is also:

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets https://arxiv.org/abs/2201.02177

Again, looking at visualizations the model very clearly grasps the structure of the function it models.

2 comments

pfortuny 1596 days ago

Modulo 97 (the arxiv paper). That is what they do.

It is quite easy to grok operations modulo 97.

link

YeGoblynQueenne 1596 days ago

The "Deep Symbolic Regression" paper reports very poor generalisation results that break off after a small n (where n is the number of tokens in the predicted sequence). It works some of the time for n = 1 (predicts the next token) but accuracy drops off for n = 10. No results are reported for N > 10 as far as I can tell in the "Out of Domain Generalization" section (which is the meat and potatoes of the "generalization" claim).

tl;dr they can sometimes generalise to the next 1 to 10 tokens (digits or operators), but no more.

This kind of short-term "generalisation" on OOD data is standard in neural nets trying to approximate symbolic regressions or things like grammars etc as far as I know.

I do like they use 'Out of Domain" rather than "Out of Distribution" as a target though. That makes more sense.

link

Isinlor 1596 days ago

I don't think you will find any human that will extrapolate sequence generated with more than 10 operators. And longer input sequences are actually easier to handle - fig 1. the right most graph.

If you think you can do better than their program then:

Seq1: [0, 1, 2, 3, 6, 7, 13, 26, 32, 58, 116, 142, 258, 516]

Seq2: [2, 2, 3, 5, 10, 12, 22, 44, 54, 98, 196, 240, 436, 872]

Seq3: [3, 1, 8, 9, 18, 19, 37, 74, 92, 166, 332, 406, 738, 1476]

Their program is able to guess correct continuation with one more sequence element.

SHA1 hash for verification: bef5e213340f91258b3b9a0042c9c083dd91cb80

link

YeGoblynQueenne 1596 days ago

I don't think I understand what you mean. Aren't all the sequences on the Online Encyclopedia of Integer Sequences created by humans? We clearly have the tools to extrapolate sequences from examples, rather than just eyballing them and trying to guess them. For instance: we have maths. So I must have misunderstood your meaning?

link

Isinlor 1596 days ago

If you look at the 3 sequences I gave you, can you guess following elements of the sequence?

We can create sequences, but guessing underlying patterns is a lot more difficult.

Humans will have very hard time if you go beyond around 10 operators in a pattern used to generate a sequence.

My guess is that their model will be better at it than me or you.

link

YeGoblynQueenne 1595 days ago

Ah, I think I see what you mean: you are saying that because it's better than humans at predicting the next element in a sequence it's good at generalising. Is that correct, or am I misrepresenting your point?

link

Isinlor 1595 days ago

Yes.

Basically there are two approaches to sequence prediction.

The traditional style, linear regression, ARIMA, RNNs etc. where you directly predict the next element in a sequence. The output is on the same level of abstraction as the internal values used in the model.

There is also the new-ish style where you predict symbols instead of predicting the values directly. You can predict symbols representing numbers or you can also predict a symbolic formula that can be used to extrapolate the values perfectly. This is the way humans do it.

And my point is that when you look at the symbols embedding they do have interpretable structure that model can use to generalize. And experiments seems to suggest that DNNs models are indeed generalizing.

link