Hacker News new | ask | show | jobs
by weinzierl 3839 days ago

    Nowadays, as several philosophers at the workshop said, 
    Popperian falsificationism has been supplanted by Bayesian 
    confirmation theory, or Bayesianism, a modern framework 
    based on the 18th-century probability theory of the English 
    statistician and minister Thomas Bayes. Bayesianism allows 
    for the fact that modern scientific theories typically make 
    claims far beyond what can be directly observed — no one has 
    ever seen an atom — and so today’s theories often resist a 
    falsified-unfalsified dichotomy. Instead, trust in a theory 
    often falls somewhere along a continuum, sliding up or down 
    between 0 and 100 percent as new information becomes 
    available. “The Bayesian framework is much more flexible” 
    than Popper’s theory, said Stephan Hartmann, a Bayesian 
    philosopher at LMU. “It also connects nicely to the 
    psychology of reasoning.”
I've never heard of Bayesianism in this context. Is this a serious approach in the philosophy of science?
5 comments

Yes. It's a fairly natural outgrowth of falsification based theories, and is in fact completely necessary.

Consider a simple theory - bear attacks will be very low in the UK forever. Consider an alternate theory - bear attacks will be low until 2016 and then the bearpocalypse happens. Both theories have passed all attempts at falsification - they both accurately predict that bears haven't so far eaten very few people.

The Bayesian approach is to assign a prior distribution to various theories of this nature. Because there are infinitely many possible priors, most exceedingly complicated (because the set of priors of complexity < C is finite or at least compact), we'll need to (eventually) assign low probabilities to high complexity ones. This gives a natural derivation of occams razor as well, at least as an asymptotic law.

A very readable approach to this is a post by Scott Alexander: http://slatestarcodex.com/2014/09/03/the-guardian-vs-inducti...

Wikipedia is also pretty good: https://en.wikipedia.org/wiki/New_riddle_of_induction

> The Bayesian approach is to assign a prior distribution to various theories of this nature. Because there are infinitely many possible priors, most exceedingly complicated (because the set of priors of complexity < C is finite or at least compact), we'll need to (eventually) assign low probabilities to high complexity ones. This gives a natural derivation of occams razor as well, at least as an asymptotic law.

I don't think this would be a very compelling argument for Occam's razor if you didn't already believe it. This argument says you can't assign high probability to all "complex" theories, but it doesn't seem to say that the high probability theories must be simple. You could use any criterion at all to single out a high probability subset.

I didn't claim it did - all I said is that this gives Occams Razor as an asymptotic law. Intuitively, I'm claiming:

Lim_{complexity -> infinity} P(theory having fixed complexity) = 0

Stated more precisely, fix a prior distribution, then for any epsilon > 0, I can find a complexity cutoff C (which depends on the prior) so that P(any theory with complexity > C being true) < epsilon.

This doesn't mean that P(theory|complexity) is monotonically decreasing, that would be a much stronger claim.

I don't know how this isn't a compelling argument, it's a provable mathematical statement.

Here's an argument that runs completely parallel:

jwmerrill's razor: points in the plane should be considered to be close to the origin unless there is evidence otherwise.

Is this a reasonable law? As reasonable as Occam's razor? I think probably not, but I don't have a strong opinion. One interesting thing to note is that the law doesn't say where the origin is (similarly, Occam's razor is vague about what exactly is meant by "simple" and "complex").

Finite asymptotic form: for any finite point set, there is a distance D such that no point in the set is further from the origin than D.

Continuous asymptotic form: given any function from points in the plane to non-negative numbers which has a finite integral, there is a distance D such that the integral of the function over the region that is further from the origin than D is less than any epsilon_1, and such that the function is everywhere less than any epsilon_2 on this region except perhaps on a set of measure 0.

The asymptotic forms are provable mathematical statements, but I think it would be a mistake to say that either of them is a very compelling argument for the original statement of "jwmerrill's razor."

Without intending to call you out in particular (I don't know what opinions you hold), I think people sometimes accept some odd logic in probability theory that they would be less likely to accept in other contexts. Bayesian probability theory provides practical solutions to a lot of interesting problems, and I personally wish people would emphasize those cases more, and make fewer sweeping statements about it being a consistent theory of all of the scientific method.

A better statement of jwmerril's razor: points drawn from a probability distribution have a higher likelihood than points coming from far away.

I don't really know why you don't think that the asymptotic forms are evidence in favor of this - a prototypical probability distribution on the real line is a bump somewhere with a decaying tail. And that "somewhere" is far closer to the origin than points out in some arbitrarily distant tail.

Now obviously if you want to make stronger claims about a specific origin, you'll need to specify a particular probability distribution, and justify why that's the right one. I agree that a non-asymptotic Occams razor is an additional assumption.

But you also get pretty far with the asymptotic theory. Consider a theory of "green" as compared to a theory of "bleen" (namely that green turns to blue after some time T). You have a prior with some probability that only green exists (say 50%), and also a 50% chance that green turns to blue after some time T. But now you have a continuous distribution over T.

Now suppose you want to make a prediction - e.g., H = "the grass will be green, not blue, at t=50". When you compute a posterior, you reject all values of T < 0 (supposing the present time is 0). Also, all values of T > 50 actually yield the same prediction as "only green exists". So the only way you can get a prediction of blue at time 50 is if 0 < T < 50. Of course, the more time you spend gathering data, the further into the tail you move and the less likely it is that your posterior will predict blue. I.e., Bayesian stats even with very few assumptions gets sensible results eventually.

I do in fact hold the view that Bayesian probability is a consistent theory of the scientific method, and also of how humans should update their beliefs when new evidence is gathered.

(Minor nit: your continuous asymptotic form isn't slightly wrong for this purpose, f(x) need not approach zero. Counterexample: f(x) = 1 for x \in [1, 1+2^{-1}], [2, 2+2^{-2}], etc, f(x) = 0 elsewhere. That integrates out to 1/2 + 1/4 + ... = 1, but lim_{x -> \infty} f(x) doesn't exist.)

[I'm also a bit surprised you are being so heavily downvoted. I don't think you are right, but you are hardly so crazily wrong that you should be greyed out.]

> The Bayesian approach is to assign a prior distribution to various theories of this nature. Because there are infinitely many possible priors, most exceedingly complicated (because the set of priors of complexity < C is finite or at least compact), we'll need to (eventually) assign low probabilities to high complexity ones. This gives a natural derivation of occams razor as well, at least as an asymptotic law.

Is the asymptotic relevant though? Physical theories are finite and generically quite small, and we have no a priori way to fix C.

Yes, this is real and it can be done rigorously. The big problem is, "How do we come up with reasonable priors?"

It is still a minority position.

> The big problem is, "How do we come up with reasonable priors?"

Predictably, as a hardcore supporter of Popper's version, I don't think this problem will be solved.

Empirical Bayes methods (in which the priors come from observed sample data) are a thing.
http://andrewgelman.com/2015/12/17/gathering-of-philosophers...

Andrew Gelman is the author of the most popular of the advanced "Bayesian Data analysis" text and is considered the standard.

He says false, bayesian falsificationism is better than confirmation theory and deborah mayo founder of error statistics essentially agrees with him.

The linked paper of Gelman is a really good read on Bayesian thinking in philosophy of science:

http://www.stat.columbia.edu/~gelman/research/published/phil...

That sounds like a theorist dodging the issue.

The use of Bayesian statistical methods in papers is reasonably normal, although still the minority position.

The use of Bayesian falsification or confirmation theory as a philosophy of science is increasingly common.

The argument that we should switch to Bayesianism in situations where we cannot sample from the likelihood function (ie: perform an experiment which yields evidence distinguishing one hypothesis from another) is a special kind of bullshit where the attempt at "science" is being packed into prior formation to favor someone's pet theory.

Falsification has also the problem of unknown unknowns. Something that gave LENR or cold fusion this enormous. setback. They simply did not understand the loading ratio at that time.... A negative replication does not lead to dis-confirmation of a process, only that this particular atomic arrangement did not work.