Hacker News new | ask | show | jobs
by XorNot 1156 days ago
Gravitation was literally about predicting future positions of the stars, and was successful because it did so much better then any geocentric model. How is that not a loss minimization activity?

And before we had it, epicycles were steadily increasing in complexity to explain every new local astronomical observation, but that model was popular because it gives a very efficient initial fit of the easiest data to obtain (i.e. the moon actually does go around the Earth, and with only 1 reference point the Sun appears to go round the Earth too). But of course once you have a heliocentric theory, you can throw all those parameters and every new prediction lines up nearly perfectly (accounting for how much longer it would take before we had precise enough orbital measurements to need Relativity to fully model it).

1 comments

When the law of gravitation was formulated, it could not in fact be used to predict orbits reliably (Kepler's ellipses are the solution to the two body problem anyways, and for a more complex system integration was impossible to any useful precision at the time), and Kepler's theories came out long before it did.

It took more than 70 years after its formulation for the law to actually be conclusively tested against observations in a conclusive manner.

Also note that Copernicus' heliocentric model retained the geocentric model's epicycles on circular orbits. It really took Kepler to make a better model. And it was better because it was explanatory to boot, and not only predictive.

At some point, the metaphor of "loss minimisation" starts to break down. When we're talking about science, there's much more we want to do than minimise some loss function- that nobody has ever written down anyway. We want to be able to say "this is how the world works". The language of function optimisation is simply not the right language to do anything like that.

Even Vladimir Vapnik turned to poetry to try and increase the information available to statistical learners. Let me see if I can find that paper...

Sure but it was a better fit, and before that heliocentric models were definitely the only way forward that didn't keep adding terms every time someone spotted a moon.

Occam's razor - do not multiply terms without necessity - is essentially a loss function.

You're talking about Kepler's model here, not about the gravitational equation. The gravitational equation was not a better fit than Kepler at that time, especially since it used unknown constants.
So would you care to comment on how this relates to the original contention, which is the claim that a loss function could not discover Newton's law of gravitation?

Because what you're arguing, extensively, is that due to lack of fit, Newton's Law of Gravitation wasn't settled science until observational data was of sufficient fidelity to clearly distinguish it.

Which sure sounds like a loss function.

Formulate the loss function -- you'll find it's just

    loss(the-right-answer(perfect-x) - perfect-y)
The most important aspect of "the-right-answer" is its ability to ignore almost all the data.

The existence of planets is "predictable" from the difference between the data and the theory -- if the theory is just a model of the data, it has no capacity to do this.

If you want to "do physics" by brute force optimization you'd need to have all possible measures, all possible data, and then a way of selecting relevant causal structures in that data -- and then able to try every possible model.

    loss(Model(all-data|relevant-causal-structures) - Filter(...|...))) forall Model 
Of course, (1) this is trivially not computable (eqv. to computing the reals) -- (2) "all possible data with all possible measures" doesn't exist and (3) selecting relevant causal structure requires having a primitive theory not derived from this very process

animals solve this in reverse order: (3) is provided by the body's causal structure; (2) is obtained by using the body to experiment; and (1) we imagine simulated ways-the-world-might-be to reduce the search space down to a finite size.

ie., we DO NOT make theories out of data. We first make theories then use the data to select between them.

This is necessary, since a model of the data (ie., modern AI, ie., automated statistics, etc.) doesnt decide between an infinite number of theories of how the data came to be.

> ie., we DO NOT make theories out of data. We first make theories then use the data to select between them.

No we don't, we make hypotheses and then test them. Hypotheses are based on data.

There are physics experiments being done right now where the exact hope is that existing theory has not predicted the result they produce, because then we'd have data to hypothesis something new.[1]

You are literally describing what deep learning techniques are designed to do while claiming they can't possibly do it.

[1] https://www.scientificamerican.com/article/measurement-shows...

The whole point is that Newton came up with the law before there was observational data that could prove it, which is fundamentally different from regression. The data is used to reject the theory, not to form it, here.
I get the feeling that the OP is using "loss function" in the figurative sense, and not in the sense of an actual loss function that is fit to observations. We know nobody did that in Newton's time. In Newton's time they didn't even have the least squares method, let alone fit a model to observations by optimising a loss function.
To clarify, the OP is pointing out that it wasn't Newton's law of universal gravitation that defeated the epicyclical model of the cosmos.

It was Kepler's laws of planetary motion that did for epicycles; and that happened 70 ish years before Newton stated his laws of motion and pointed out that they basically subsume Kepler's laws of planetary motion.