Hacker News new | ask | show | jobs
by samch93 2570 days ago
As someone who has a master's degree in statistics and often uses Bayesian statistics, I think we should not focus on whether one is a Bayesian or a frequentist, but rather be pragmatic and take the most practical approach to solving a statistical problem. Moreover, I think statistical education should start with frequentist concepts and then extend them to the Bayesian framework since the likelihood plays also a major role in obtaining the posterior distribution. In my opinion, this progression is much more natural than starting fully Bayesian.
5 comments

Respectfully, I find that people who are not statisticians overwhelming disagree with your point on which should be taught first.

Bayesian is a natural order of inference for people. The whole concept of the black swan ("all swans are white") proves this out.

Frequentist statistics is much less intuitive to people.

My preference is for people to be able to use some statistics, and Bayesian gets them productive faster.

Frequentist statistics is often pretty poorly taught. Ideas like likelihood, modeling, and optimization underly the mechanics of both worlds. There's a big obsession with testing, but the Neyman Pearson testing framework is sound an intuitive.

Bayesian statistics gets a big boost because it's usually taught as a system instead of as a recipe book.

I would argue that the problem with frequentist statistics is that it aligns with humans' flawed intuition of how randomness works. People are inherently obsessed with finding patterns to support their hypotheses.

The problem is that what we perceive as random and extremely unlikely events are in fact much more probable than what we estimate from using Gaussian methods. And the frequentist approach helps to create this distortion by ignoring black swans.

Here's a great video demonstrating how people tend to misunderstand randomness: https://youtu.be/tP-Ipsat90c

One approach gives the right answer. The other approach is more computationally tractable. Computers are pretty powerful now, so we can afford the correct answer much more often than we used to.

As for what is more natural… I've seen a (frequentist) introduction to statistics, and it simply did not make sense. Nothing was justified, you just had to learn the stuff by rote and apply it in situations that look like they could use one tool or another.

Probability theory on the other hand is pretty obvious. The axioms required to derive it are ridiculously few and ridiculously intuitive. From there you get the sum and product rules, and all the rest. Always made perfect sense to me.

I am surprised by how many people equal frequentist statistics with Neyman-Pearson hypothesis testing. In my opinion, the main difference between the two approaches being whether the parameters of a statistical model are considered as fixed or random, everything else follows from this.

On the subject of statistical education: The point I tried to make is that I think it is much easier to study first the likelihood, the central quantity of frequentist inference. One can then go to the Bayesian world simply by allowing the parameters to be random variables. Furthermore, as other commentors have pointed out, technical difficulties arise in the non-conjugate Bayesian setting when MCMC sampling has to be used. In my opinion, MCMC algorithms, convergence diagnostics, etc. are certainly not topics for an intro stats course.

Having taught frequentist stats as a TA to grad students, I understand why frequentist stats seems not to make sense. On the other had, my prior on teaching quality, and my data on the relative difficulty of understanding the approaches says with near-certainty that your experience has nothing to do with the approach taken.

Having used Bayesian stats heavily, I'd note that the hard parts are not gone, they are just located elsewhere - in how to actually do the computations, rather than in how to set up problems. Each can be taught poorly or well, but given that MCMC is certainly harder than least-squares, it seems difficult to argue that using Bayesian statistics is easier. (Unless you're not just applying the methods by rote, and letting the computer spit out answers - and if you are, I don't know why you are better off with Bayesian methods. In fact, if that's what you're doing, please stop doing statistics and pay an expert instead.)

> given that MCMC is certainly harder than least-squares, it seems difficult to argue that using Bayesian statistics is easier.

Actually, I am not saying Bayesian statistics are easier to use. I was saying they looked easier to understand. Though I must point out that "Bayesian" may be the wrong word here. What truly makes sense to me is Probability Theory, which Edwin T. Jaynes describes pretty well.

(That does not make me any more capable at applying MCMC, which I don't even know of. Searching… Ah, Markov Chain Monte Carlo, yeah that's not easy. Plus, this sounds like an approximation of probability theory… not that we have anything better, mind you: I know that applying probability theory directly is often computationally intractable.)

I agree with this approach, and this is roughly the approach my own Statistics master's degree takes as well. It can be challenging to understand the finer points of likelihoods and posteriors (and the how to choose a prior) without serious mathematics that you're unlikely to have upon entering a graduate statistics degree.

Starting with applied probability and applied statistics (incl. regression, ANOVA, GLMs) allow you to solve problems and feel useful and engaged before being thrown into the mathematical rigor required of Bayesian statistics.

I agree, although I respect those who look for deeper justification for the methods we use. Bayesian statistics/decision theory does have axiomatic foundations after all.
So does frequentist stats - they are just different axiomatic foundations and assumptions.
I'm less familiar with them - I've certainly seen many plausible frequentist arguments, but I've never been exposed to any unifying framework which would require that one make decisions based on type-1 error rate controlling hypothesis tests. That's not to say such foundations don't exist, I'm just happy being a philosophical Bayesian who sometimes does frequentist or algorithmic/ML things for practical reasons.
In an introductory course, we should be teaching people to collect enough data that any reasonable choice of prior or method doesn't matter that much.
I started college in 1982. At that time, calculators were common, but not computers. The data sets had to be small enough for us to work problems by hand. Not any more. I see no reason why a stats course can't start out with big bright data sets that are easy to analyze, then advance through more difficult problems where it becomes progressively easier to get things wrong, and thus requires more sophistication to think about problems.

I just want to add a bit more. It's quite easy today, to generate and play with random numbers. If you think you understand a process that has generated your data simulate it and run the simulated data through the same analysis. I do this for real -- I don't trust myself to choose the right statistical analysis, so I always test my chosen analysis with simulated data. If I can fool myself with simulated data, than my real data is probably fooling me too.

That is often not possible.

Could we, for instance, collect enough data on typing discipline to end the static/dynamic typing once and for all? Enough data to overcome the priors of both static typing and dynamic typing proponents?

We could, but that would require pretty big sample sizes. Like 10,000 developers of various competence, working on 1,000 projects of various domains and difficulties for various amounts of time (from a few days to at least a few months). Who is ever going to fund that?

Until we get such a miracle controlled study, our respective priors will still matter.