Hacker News new | ask | show | jobs
by eli_gottlieb 3406 days ago
Ooooookay. So, very long story short...

They're two different academic traditions for what constitutes Good Statistics. They're originally rooted in the philosophical dispute over whether to treat probabilities as frequencies of random outcomes ("frequentist") or as degrees of plausibility ("Bayesian").

In actual fact, a well-trained frequentist knows exactly how and when to use Bayes' rule for gambling, and a well-trained Bayesian knows exactly how and when to publish a paper with a p-value.

The really important difference is over how a whole field expresses its consensus or tradition about what constitutes strong evidence or a plausible theory. A Bayesian would like researchers to elicit priors before experiments (which express something like what reviewers' expectations will be about the experiment), and then calculate posterior distributions after experiments. We could thus then trade off "weak" and "strong" experiments against prior beliefs, while also reducing publication bias' pernicious effect on statistical strength -- or so Bayesians claim. Bayesian methods are also usually more computationally intensive and can make use of small sample sizes.

Frequentists had a lot of disagreements with that sort of thing, and so Neyman-Pierce and Fisher and the like developed a whole lot of statistical methods that don't rely on ever treating a probability as a belief. They preferred to differentiate clearly between a frequency of experimental outcomes, and what researchers think. They figured that Bayesian "priors" were subjective, biased, and untrustworthy. Also, quite importantly, their methods involved a lot less rote computation and instead made use of impressively large experimental samples.

Depending on which tradition you were raised in, and which philosophers of science you side with, you can argue until the end of the world about which one's better. My advice? Use whatever your field demands you use to publish, but be Bayesian on the inside.

2 comments

I'm not a statistician, and have only studied frequentist statistics (I assume that's the standard taught in introductory stats courses in school).

Like the person at the root of this thread, I have struggled with explanations on why Bayesian is so great. The answers that worry me tend to be along the lines of "Well, suppose you want the probability for event X (typically a "one-off" event). Frequentist statistics cannot give you an answer (one-off events have no distribution to speak of). But with Bayesian statistics, I can compute a probability for it!"

Yes, but as someone else has pointed out, what the heck do you mean by "probability"? Frequentist statistics is fairly clear on the definition. The whole argument given above seems like he is happy he has some mechanism to get an answer, with little thought about whether he is asking a meaningful question.

Which is why your comment resonates with me:

>They preferred to differentiate clearly between a frequency of experimental outcomes, and what researchers think. They figured that Bayesian "priors" were subjective, biased, and untrustworthy.

I don't want an answer that's dependent on how the person thought. That definitely comes across as subjective to me.

>I don't want an answer that's dependent on how the person thought. That definitely comes across as subjective to me.

Then I think you'll be somewhat disappointed when you learn more about philosophy of science and the core debates over methodology. The biggest problem is: nothing is purely objective. Everything involves assumptions of some sort, otherwise we run head-on into the Problem of Induction, white ravens, No Free Lunch Theorems (on the more machine-learny side), and other such problems.

>Yes, but as someone else has pointed out, what the heck do you mean by "probability"? Frequentist statistics is fairly clear on the definition. The whole argument given above seems like he is happy he has some mechanism to get an answer, with little thought about whether he is asking a meaningful question.

I don't think frequentist statistics are very clear here at all! A p-value, after all, is a likelihood, which frequentist statisticians insist is not a probability, but which the math clearly says is a conditional probability. So when you get a p<0.05 finding, it never means, "We actually ran this experiment under a control hypothesis N times, for some large N, and fewer than five came out this way." It's a measure of counterfactual outcomes, conditional on an assumption which we pretend to expect to be true. When the p-value is small, we then pretend to be surprised, and pretend to make an interesting inference.

I say "pretend" because an ordinary NHST is mathematically equivalent to a Bayesian credible hypothesis test with a uniform prior over the hypotheses. Performing the frequentist test involves pretending to believe that uniform prior, even though you probably actually set up the experiment in order to obtain a significant p-value.

In the end, the NHST is a chiefly social practice, and the p-value is chiefly social evidence. It's a way of convincing peer reviewers to accept (that is, subjectively believe) that you did a real experiment, when they would otherwise skeptically believe that you made it all up (which, unfortunately, some researchers have been known to do!).

Bayesian methods don't get rid of this subjective, social component to science and make everything "objective", any more than you can do that by hiring Mr. Spock to do your statistics. Bayesian methods drag the subjective, social component of prior elicitation out into the sunlight where everyone involved has to acknowledge it. They also give you numbers that are actually about the experiment you really did, as opposed to measuring your experiment against an infinity of counterfactual experiments you never really performed.

(And also they're easier with small sample sizes, their results are more intuitive to interpret, and generative models are more intuitive to think about than test statistics.)

All that said, I totally have used frequentist statistics (took a very similar class to yours) when called upon to do so. Fighting a philosophy-of-statistics holy war against your higher-ups in the workplace hierarchy is a really bad idea, so however nice Bayesian or frequentism might sound, sometimes you buckle down and do what ships products and publishes papers.

Your criticism of p-value usage is legitimate. However, this is not core to frequentist statistics.

When I first encountered p-values, even with a frequentist mindset, I saw the huge problem that one could have with them. Many frequentists do not like p-values. I wouldn't be surprised if most actual frequentist statisticians (not those in fields like medicine, psychology, etc) do not like p-value usage.

Attacking p-values is not a valid argument against frequentist statistics.

I'll also add that it seems that many Bayesians are really dying for a number, and because frequentist stats doesn't give it to them, they reach for another tool that will - but with little thought about the validity of the tool. I'm not here to defend frequentist statistics, but just because it doesn't give all the answers, that does not mean that some other tool that does give some answers is correct.

It is equally abusable as p-values. I suppose if a Bayesian says he used Bayesian approaches because it made sense given his problem, that's fine (and in my mind, he is just being a statistician, not a Bayesian). The self-identified Bayesians I always encounter don't fall into that mold. They fall into the category of "Look what I can compute that I could not with frequentist statistics" - but any attempts I have to understand what that number means fails - they cannot explain it either, beyond "this is how I feel".

I'm not really trying to make an argument against frequentist statistics and for Bayesian ones. I'm more trying to point out what each style exposes (by printing it in your papers) or conceals (by leaving it semi-consciously understood from that one class in grad school).
> be Bayesian on the inside

Strongly disagree, tbh. Picking one side or the other in this debate is silly. Don't "be Frequentist" so as to avoid Bayesian model building techniques since you'll end up stuck all the time and don't "be Bayesian" so as to look down upon simple, workable, un-motivated estimation procedures with good performance.

I didn't mean "look down upon ... workable .. procedures with good performance." I meant a more commonsense sort of "private Bayesianism", where you maintain a healthy skepticism of things that have always failed before, and a healthy reliance on things that have always worked before, even when public scientific discourse purports to show you very strong non-Bayesian evidence.

For example, back in my MSc days, I would run a whole lot of metrics on our dataset, and look for patterns. Sometimes I would find a strong, interesting pattern, and go try to tell my advisor about it. He would ask me to double-check my code for bugs, rerun things, and see if the pattern was still there. Often, it wasn't.

My advisor was nobody's Bayesian, a frequentist (and even a user of purely descriptive statistics, oftentimes) through and through.

So to me, "Bayesian on the inside" ends up meaning, "at least Bayesian enough to look for experimental errors." This attitude has helped me a lot in debugging difficult snafus in industry, too.

I see what you're saying now and quite like that. Thanks for clarifying!