Hacker News new | ask | show | jobs
by yummyfajitas 3839 days ago
Yes. It's a fairly natural outgrowth of falsification based theories, and is in fact completely necessary.

Consider a simple theory - bear attacks will be very low in the UK forever. Consider an alternate theory - bear attacks will be low until 2016 and then the bearpocalypse happens. Both theories have passed all attempts at falsification - they both accurately predict that bears haven't so far eaten very few people.

The Bayesian approach is to assign a prior distribution to various theories of this nature. Because there are infinitely many possible priors, most exceedingly complicated (because the set of priors of complexity < C is finite or at least compact), we'll need to (eventually) assign low probabilities to high complexity ones. This gives a natural derivation of occams razor as well, at least as an asymptotic law.

A very readable approach to this is a post by Scott Alexander: http://slatestarcodex.com/2014/09/03/the-guardian-vs-inducti...

Wikipedia is also pretty good: https://en.wikipedia.org/wiki/New_riddle_of_induction

2 comments

> The Bayesian approach is to assign a prior distribution to various theories of this nature. Because there are infinitely many possible priors, most exceedingly complicated (because the set of priors of complexity < C is finite or at least compact), we'll need to (eventually) assign low probabilities to high complexity ones. This gives a natural derivation of occams razor as well, at least as an asymptotic law.

I don't think this would be a very compelling argument for Occam's razor if you didn't already believe it. This argument says you can't assign high probability to all "complex" theories, but it doesn't seem to say that the high probability theories must be simple. You could use any criterion at all to single out a high probability subset.

I didn't claim it did - all I said is that this gives Occams Razor as an asymptotic law. Intuitively, I'm claiming:

Lim_{complexity -> infinity} P(theory having fixed complexity) = 0

Stated more precisely, fix a prior distribution, then for any epsilon > 0, I can find a complexity cutoff C (which depends on the prior) so that P(any theory with complexity > C being true) < epsilon.

This doesn't mean that P(theory|complexity) is monotonically decreasing, that would be a much stronger claim.

I don't know how this isn't a compelling argument, it's a provable mathematical statement.

Here's an argument that runs completely parallel:

jwmerrill's razor: points in the plane should be considered to be close to the origin unless there is evidence otherwise.

Is this a reasonable law? As reasonable as Occam's razor? I think probably not, but I don't have a strong opinion. One interesting thing to note is that the law doesn't say where the origin is (similarly, Occam's razor is vague about what exactly is meant by "simple" and "complex").

Finite asymptotic form: for any finite point set, there is a distance D such that no point in the set is further from the origin than D.

Continuous asymptotic form: given any function from points in the plane to non-negative numbers which has a finite integral, there is a distance D such that the integral of the function over the region that is further from the origin than D is less than any epsilon_1, and such that the function is everywhere less than any epsilon_2 on this region except perhaps on a set of measure 0.

The asymptotic forms are provable mathematical statements, but I think it would be a mistake to say that either of them is a very compelling argument for the original statement of "jwmerrill's razor."

Without intending to call you out in particular (I don't know what opinions you hold), I think people sometimes accept some odd logic in probability theory that they would be less likely to accept in other contexts. Bayesian probability theory provides practical solutions to a lot of interesting problems, and I personally wish people would emphasize those cases more, and make fewer sweeping statements about it being a consistent theory of all of the scientific method.

A better statement of jwmerril's razor: points drawn from a probability distribution have a higher likelihood than points coming from far away.

I don't really know why you don't think that the asymptotic forms are evidence in favor of this - a prototypical probability distribution on the real line is a bump somewhere with a decaying tail. And that "somewhere" is far closer to the origin than points out in some arbitrarily distant tail.

Now obviously if you want to make stronger claims about a specific origin, you'll need to specify a particular probability distribution, and justify why that's the right one. I agree that a non-asymptotic Occams razor is an additional assumption.

But you also get pretty far with the asymptotic theory. Consider a theory of "green" as compared to a theory of "bleen" (namely that green turns to blue after some time T). You have a prior with some probability that only green exists (say 50%), and also a 50% chance that green turns to blue after some time T. But now you have a continuous distribution over T.

Now suppose you want to make a prediction - e.g., H = "the grass will be green, not blue, at t=50". When you compute a posterior, you reject all values of T < 0 (supposing the present time is 0). Also, all values of T > 50 actually yield the same prediction as "only green exists". So the only way you can get a prediction of blue at time 50 is if 0 < T < 50. Of course, the more time you spend gathering data, the further into the tail you move and the less likely it is that your posterior will predict blue. I.e., Bayesian stats even with very few assumptions gets sensible results eventually.

I do in fact hold the view that Bayesian probability is a consistent theory of the scientific method, and also of how humans should update their beliefs when new evidence is gathered.

(Minor nit: your continuous asymptotic form isn't slightly wrong for this purpose, f(x) need not approach zero. Counterexample: f(x) = 1 for x \in [1, 1+2^{-1}], [2, 2+2^{-2}], etc, f(x) = 0 elsewhere. That integrates out to 1/2 + 1/4 + ... = 1, but lim_{x -> \infty} f(x) doesn't exist.)

[I'm also a bit surprised you are being so heavily downvoted. I don't think you are right, but you are hardly so crazily wrong that you should be greyed out.]

> The Bayesian approach is to assign a prior distribution to various theories of this nature. Because there are infinitely many possible priors, most exceedingly complicated (because the set of priors of complexity < C is finite or at least compact), we'll need to (eventually) assign low probabilities to high complexity ones. This gives a natural derivation of occams razor as well, at least as an asymptotic law.

Is the asymptotic relevant though? Physical theories are finite and generically quite small, and we have no a priori way to fix C.