Hacker News new | ask | show | jobs
by kxyvr 3260 days ago
I believe the process for deriving fundamental physical models differs from the techniques used in ML. For example, say we want to use the principle of least action to derive an expression for energy similar to what Landau and Lifshitz derive in their book Mechanics. Here, we assume that the motion of a particle is defined by its position and velocity. We assume that the motion of the particle is defined by an optimization principle. We assume Galilean invariance. We assume that space and time are homogeneous is isotropic. Then, putting this all together we can derive an expression for energy that `E=0.5 m v^2`. At this point, we can validate our model with a series of experiments that curve fits this expression to the results.

Alternatively, we could just run a bunch of experiments on data using ML models. Eventually, someone may have a wonderful idea and realize that we can just reduce the ML model into a parabola. Of course, this is due to intuition and not the ML model. Nevertheless, even though we end up at the same result, I contend the first result is different. It has a huge amount of information embedded into it about the assumptions we made into how the world works. When those assumptions are no longer satisfied, we have a rubric for constructing a fix. For example, if Galilean invariance no longer holds, we can fix the above model using the same sort of derivations to obtain relativistic expressions. Again, we could just throw more data at this new problem and fit an ML model to and perhaps someone would stare at this new model and realize that `E = m c^2`. However, I think that's discounting the embedded information in deriving these models and I don't think this information is present in ML models. ML models are generic. Our most powerful physical models are not.

Now, sure, once we have the models, we're just going to fit them to the data and it's all just curve fitting. Other fields call this parameter estimation, parameter identification, or a variety of other names. At that point it's all curve fitting. However, again, I contend the process for determining a new model is not.

2 comments

Of course. "What do I fit this curve to" is a prerequisite to "what is the shape of this curve?"

You shouldn't feel the need to defend theory-based modeling against some imagined incursion from arrogant deep learning researchers. NNs work tremendously well in a few specific problem domains that we had no way to approach otherwise. Elsewhere, they're not much better than any other prediction algorithm. By the way XGBoost is curve-fitting, too.

I very much agree! Barring some kind of special intuition to the problem, I think ML are a fantastic tool for building models from empirical data. Even with intuition, sometimes they work as well. My core argument is that anthropomorphizing the algorithms has led to a great deal of confusion as to when we should or should not use these models. I often do computational modeling work with engineers and many of them are starting to eschew good, foundationaly sound models for ML not because they work better, in fact, on many of these problems they work far, far worse, but because good computational modeling is hard and it sounds like all they have to do with ML is teach the algorithms how physics works and how to be an engineer. Since they're good teachers, they should be able to teach the algorithm, right? In reality, it's still dirty, grinding computational modeling work. If we just called these models what they really are, empirical models, I think there'd be far less confusion as to when they should be used.
You haven't explained how the first case isn't "curve fitting": the agents performing the compilation of those facts into the new fact are just spitting out the "best" fit string of symbols based on learned rules, etc etc. Somethings computers can (theoretically) do, and which fits the description "curve fitting" just fine. School (and other education) is training the model they're using to do that compilation, but it's still just "curve fitting" based on reward/punishment signals.

What part of that can't an ML agent learn to do?

From my perspective, you're just describing the "higher order" layers of the network and pretending that humans aren't actually running those functions embedded on deep networks, then proclaiming that deep networks can't do it.

Alright, so from my perspective, curve fitting consists of three things

1. Definition of a model. ML models like multilayer perceptrons used a superposition of sigmoids, but newer models have superpositions of other functions and more nested hierarchies.

2. A metric to define misfit. Most of the time we use least squares because it's differentiable, but other metrics are possible.

3. An optimization algorithm to minimize misfit. Backpropogation is a combination of an unglobalized steepest descent combined with automatic differentiation like algorithm to obtain the derivatives. However, there is a small crowd that uses Newton methods.

Literally, this means curve fitting is something like the problem

min_{params) 0.5 sum_i || model(params,input_i) - output_i ||^2

Of course, there's also a huge number of assumptions in this. First, optimization requires a metric space since we typically want to make sure we're lower than all the points surrounding it. Though, this isn't all that helpful from an algorithmic point of view, so we really need an complete inner product space in order to derive out optimality conditions like the gradient of the objective being zero. Alright, fine, that means if we want to do what you say then we need to figure out how to compile these facts into a Hilbert space. Maybe that's possible and it raises some interesting questions. For example, Hilbert spaces have the property that `alpha x + y` also lie in the vector space. If `x` is an assumption like Galilean invariance and `y` is an assumption that time and space are isotropic, I'm not sure what the linear combination would be, but perhaps it's interesting. Hilbert spaces also require inner products to be well defined and I'm not sure what the inner product between these two assumptions are either. Of course, we don't technically need a Hilbert or Banach space to optimize. Certainly, we lose gradients and derivatives, but there may be something else we can do. Of course, that would involve creating an entire new field of computational optimization theory that's not dependent on derivatives and calculus, which would be amazing, but we don't currently have one.

From a philosophical point of view, there may be a reasonable argument that everything in life is mapping inputs to outputs. From a practical point of view, this is hard and the foundation upon which ML is cast is based on certain assumptions like the three components above, which have assumptions on the structures we can deal with. Until that changes, I continue to contend that, no, ML does not provide a mechanism for deriving new fundamental physical models.

What do you think about a bayesian interpretation of the above as MAP/MLE?

https://arxiv.org/abs/1706.00473

Unless I'm missing something, and I likely am, the linked paper is still based on the the fundamental assumptions behind curve fitting that I listed above. Namely, their optimization algorithms, metrics, and models are still based on Hilbert spaces even though they've added stochastic elements and more sophisticated models.
Interesting abstract. I love Bayesian stats so hopefully this will be a fun commute read. Thanks!
I think you're reading way too far into my post. I was just pointing out that our amazing AI revolution is really just a new type of function approximation being that has magical-seeming results.
I can't think of a succinct way to describe my response, but I'm not sure we disagree, so much as we're talking about slightly different things.

Regardless, I wanted to thank you for the detailed replies -- having a back and forth helped me ponder my thoughts on the matter.

Have a good one. (:

Thanks for chatting!