Hacker News new | ask | show | jobs
by aseg 1311 days ago
I encourage everyone to read this paper. It's well written and easy to follow along. To the uninitiated, SR is the problem of finding a mathematical (symbolic) expression that most accurately describes a dataset of input-output examples (regression). The most naive implementation of SR is basically a breath first search starting from the simplest program tree: x -> sin(x) -> cos(x) ... sin(cos(tan(x))) until timeout. However, we can prune out equivalent expressions and, in general, the problem is embarrassingly parallel which alludes to some hope that we can solve this pretty fast (check out PySR[1] for a modern implementation). I find SR fascinating because it can be used for model distillation: learn a DNN approximation and "distill" it to a symbolic program.

Note that the paper talks about the decision version of the SR problem. ie: can we discover the global optimum expression. I think this proof is important for the SR community but not particularly surprising (to me). However, I'm excited by the potential future work for this paper! A couple of discussion points:

* First, SR is technically a bottom up program synthesis problem where the DSL (math) has an equivalence operator. Can we use this proof to impose stronger guarantees on the "hyperparameters" for bottom up synthesis. Conversely, does the theoretical foundation of the inductive synthesis literature [2] help us define tighter bounds?

(EDIT: I was thinking a bit more about this and [2] is a bad reference for this... Jha et al give proofs for synthesis w/ CEGIS where the synthesizer queries a SMT solver for counterexamples until there are none... kinda like a GAN. Apologies.)

* Second, while SR itself is NP hard, can we say anything about the approximate algorithms (eg: distilling a deep neural network to find a solution[3])? Specifically, what proof tell us about the PAC learnability of SR?

Anyhow, pretty cool seeing such work get more attention!

[1] https://github.com/MilesCranmer/PySR

[2] https://susmitjha.github.io/papers/togis17.pdf

[3] https://astroautomata.com/paper/symbolic-neural-nets/

1 comments

Random question, if you don't mind. What does symbolic fitting give you that normal regression doesn't? Usually in physics, an analytic solution (our term) yields physical insight, although not everyone is a physicist and cares about that, hence why machine learning in general is popular.
You might find (Cramer, 1985) interesting! IIRC they go into exactly this problem. However, I can't find an open pdf to link to unfortunately. I'll edit this list tomorrow morning in case I misjudged (Cramer, 1985) or if I find a link..

Here are two hand-wavy arguments that may not be 100% correct:

* Structured Bias: A symbolic regression term allows you -- the scientist -- to control exactly what sort of expressions you expect to see. If I'm looking at data coming from a spring, I expect to see a lot of dampened sinusoids and little quantum physics. SR gives you control over the "programming language" while parametric regression will only allow you to change the number of parameters (not useful in this context).

* Generality: A regression term guarantees the best fit parametric equation as long as you have a comprehensive sample of your data range. A symbolic expression (most of the time) extrapolates beyond the provided data range. In fact, this is one of the constraints in the main proof in the paper (f* should generalize)! Basically: If I only have data for sin(x) from 0 to \pi, PR will find the best fit but there is no guarantee that the best fit will also work in the range \pi to 2\pi.

I want to stress that these aren't established facts and each of these pros actually introduces a lot of cons in the process (what if you introduce incorrect structured bias / what if the "general/simple" solution is actually a little imprecise... Like Newton's laws vs Einstein's theory)! This just means that there is plenty of exciting work to be done!

> while parametric regression will only allow you to change the number of parameters

I suspect you mean "the value of parameters" here.

The regression will 'set the value of the parameters'. OPs point is that when you want to change the regression your main 'hyper-parameter' (i.e. one that you get to choose rather than one that the system determines for you) is the number of parameters.

Most Neural networks have other hyper-paramters, but the ones in SR are probably quite interpretable and intuitive.

In "regular" regression, you have to come up with an input -> output equation. The regression only figures out the best parameters for it.

This might be difficult, especially if you can't visualize the data (many dimensions etc).

For example, a numeric regression for a1 * x1 + a2 * x2 + a3 = y can not fit a relationship like y = x1/x2.

In symbolic regression, the system can automatically discover equations also, not just coefficients.

My initial guess is that regression determines how well you fit an existing pattern or group of patterns but doesn't find out if there is an even better fitting pattern not used. Linear regression will determine how well you fit a linear pattern and while you can use it to search for non-linear patterns (like a regression of home prices against area squared instead of area), it only tests the patterns you put in.
Potentially much better extrapolation from the data, for one thing. The hope with SR is that the equation that is found will in some sense have a deeper connection to the process that generated the data, than a neural network made of lots of RelU units does.
Non linear forms most importantly.
A Fourier expansion is a symbolic fitting of data to a curve, yet it provides no information about the systems sampled as shown by the fact that it fits a geocentric model just as well as a heliocentric one.

The idea that an arbitrarily large expression is somehow more understandable than the fourier coefficients of a the first few large terms could only have been done by someone who hasn't looked the the vast array of semi-empirical formulas out there which are as clear as mud.

Well, I think the hope is that formulae reveal relationships. Notable applications of SR have mostly been to physics and other science phenomena. So, you do some SR to understand the mathematical form of the relationship, and then carry on by trying to understand it causally.