Hacker News new | ask | show | jobs
Interview on ”Bayesian Statistics the Fun Way” (notamonadtutorial.com)
222 points by unbalancedparen 2565 days ago
8 comments

As someone who has a master's degree in statistics and often uses Bayesian statistics, I think we should not focus on whether one is a Bayesian or a frequentist, but rather be pragmatic and take the most practical approach to solving a statistical problem. Moreover, I think statistical education should start with frequentist concepts and then extend them to the Bayesian framework since the likelihood plays also a major role in obtaining the posterior distribution. In my opinion, this progression is much more natural than starting fully Bayesian.
Respectfully, I find that people who are not statisticians overwhelming disagree with your point on which should be taught first.

Bayesian is a natural order of inference for people. The whole concept of the black swan ("all swans are white") proves this out.

Frequentist statistics is much less intuitive to people.

My preference is for people to be able to use some statistics, and Bayesian gets them productive faster.

Frequentist statistics is often pretty poorly taught. Ideas like likelihood, modeling, and optimization underly the mechanics of both worlds. There's a big obsession with testing, but the Neyman Pearson testing framework is sound an intuitive.

Bayesian statistics gets a big boost because it's usually taught as a system instead of as a recipe book.

I would argue that the problem with frequentist statistics is that it aligns with humans' flawed intuition of how randomness works. People are inherently obsessed with finding patterns to support their hypotheses.

The problem is that what we perceive as random and extremely unlikely events are in fact much more probable than what we estimate from using Gaussian methods. And the frequentist approach helps to create this distortion by ignoring black swans.

Here's a great video demonstrating how people tend to misunderstand randomness: https://youtu.be/tP-Ipsat90c

One approach gives the right answer. The other approach is more computationally tractable. Computers are pretty powerful now, so we can afford the correct answer much more often than we used to.

As for what is more natural… I've seen a (frequentist) introduction to statistics, and it simply did not make sense. Nothing was justified, you just had to learn the stuff by rote and apply it in situations that look like they could use one tool or another.

Probability theory on the other hand is pretty obvious. The axioms required to derive it are ridiculously few and ridiculously intuitive. From there you get the sum and product rules, and all the rest. Always made perfect sense to me.

I am surprised by how many people equal frequentist statistics with Neyman-Pearson hypothesis testing. In my opinion, the main difference between the two approaches being whether the parameters of a statistical model are considered as fixed or random, everything else follows from this.

On the subject of statistical education: The point I tried to make is that I think it is much easier to study first the likelihood, the central quantity of frequentist inference. One can then go to the Bayesian world simply by allowing the parameters to be random variables. Furthermore, as other commentors have pointed out, technical difficulties arise in the non-conjugate Bayesian setting when MCMC sampling has to be used. In my opinion, MCMC algorithms, convergence diagnostics, etc. are certainly not topics for an intro stats course.

Having taught frequentist stats as a TA to grad students, I understand why frequentist stats seems not to make sense. On the other had, my prior on teaching quality, and my data on the relative difficulty of understanding the approaches says with near-certainty that your experience has nothing to do with the approach taken.

Having used Bayesian stats heavily, I'd note that the hard parts are not gone, they are just located elsewhere - in how to actually do the computations, rather than in how to set up problems. Each can be taught poorly or well, but given that MCMC is certainly harder than least-squares, it seems difficult to argue that using Bayesian statistics is easier. (Unless you're not just applying the methods by rote, and letting the computer spit out answers - and if you are, I don't know why you are better off with Bayesian methods. In fact, if that's what you're doing, please stop doing statistics and pay an expert instead.)

> given that MCMC is certainly harder than least-squares, it seems difficult to argue that using Bayesian statistics is easier.

Actually, I am not saying Bayesian statistics are easier to use. I was saying they looked easier to understand. Though I must point out that "Bayesian" may be the wrong word here. What truly makes sense to me is Probability Theory, which Edwin T. Jaynes describes pretty well.

(That does not make me any more capable at applying MCMC, which I don't even know of. Searching… Ah, Markov Chain Monte Carlo, yeah that's not easy. Plus, this sounds like an approximation of probability theory… not that we have anything better, mind you: I know that applying probability theory directly is often computationally intractable.)

I agree with this approach, and this is roughly the approach my own Statistics master's degree takes as well. It can be challenging to understand the finer points of likelihoods and posteriors (and the how to choose a prior) without serious mathematics that you're unlikely to have upon entering a graduate statistics degree.

Starting with applied probability and applied statistics (incl. regression, ANOVA, GLMs) allow you to solve problems and feel useful and engaged before being thrown into the mathematical rigor required of Bayesian statistics.

I agree, although I respect those who look for deeper justification for the methods we use. Bayesian statistics/decision theory does have axiomatic foundations after all.
So does frequentist stats - they are just different axiomatic foundations and assumptions.
I'm less familiar with them - I've certainly seen many plausible frequentist arguments, but I've never been exposed to any unifying framework which would require that one make decisions based on type-1 error rate controlling hypothesis tests. That's not to say such foundations don't exist, I'm just happy being a philosophical Bayesian who sometimes does frequentist or algorithmic/ML things for practical reasons.
In an introductory course, we should be teaching people to collect enough data that any reasonable choice of prior or method doesn't matter that much.
I started college in 1982. At that time, calculators were common, but not computers. The data sets had to be small enough for us to work problems by hand. Not any more. I see no reason why a stats course can't start out with big bright data sets that are easy to analyze, then advance through more difficult problems where it becomes progressively easier to get things wrong, and thus requires more sophistication to think about problems.

I just want to add a bit more. It's quite easy today, to generate and play with random numbers. If you think you understand a process that has generated your data simulate it and run the simulated data through the same analysis. I do this for real -- I don't trust myself to choose the right statistical analysis, so I always test my chosen analysis with simulated data. If I can fool myself with simulated data, than my real data is probably fooling me too.

That is often not possible.

Could we, for instance, collect enough data on typing discipline to end the static/dynamic typing once and for all? Enough data to overcome the priors of both static typing and dynamic typing proponents?

We could, but that would require pretty big sample sizes. Like 10,000 developers of various competence, working on 1,000 projects of various domains and difficulties for various amounts of time (from a few days to at least a few months). Who is ever going to fund that?

Until we get such a miracle controlled study, our respective priors will still matter.

As someone who uses statistics all the time at work, I sympathize so much with this article and greatly enjoyed it. Every time I try to introduce a Bayesian prior, coworkers either look at me like I'm crazy (because they've never heard of or used Bayesian stats) or like I've suddenly gone soft and introduced a bunch of nebulous, touchy-feely context into the objective truth (if they're dedicated frequentists).

Then we promptly switch back to p-values of .05, a lot of the time not even bothering with a statistical power calculation. I've had better success with introducing power, though. I suspect that's because we can fit it into the existing frequentist framework.

> like I've suddenly gone soft and introduced a bunch of nebulous, touchy-feely context into the objective truth

This drives me nuts. If you haven't, check out the paper "Beyond subjective and objective in statistics" by Gelman and Hennig (2017).

Right at the beginning they make the point that any analysis includes external information in many ways, such as adjusting variables for imbalance, how we deal with outliers, regularization, etc.

Especially if you're doing any sort of causal inference, you're usually making strong assumptions before estimating your model, even just in terms of which variables are included and how they're connected. The idea that priors are somehow ruining an "objective" model is just absurd to me. You're already making so many other decisions about your model that will affect estimates and your interpretation of them. Priors seem like another perfectly reasonable decision to have to make as well, with the benefit of getting results that I think in general are must more easily understood by a lay audience. (E.g., I don't think I've ever encountered someone not on my data science team that actually understands what a p-value is. But people are much better at understanding when I say, there's an X percent chance that there is a positive effect here.)

This critique might come from the idea that having a good analytic model, or at least some valuable analytic insights, involves much more than assigning some priors. Of course, the two things don't exclude each other, but for some frequentists Bayesians have the wrong perspective - or at least that's the critique, whether it's true or not.

Another issue that I personally have with Bayesianism is that I believe that assigning probabilities to singular events is only meaningful and admissible at all if there is a good analytic explanation for the respective propensity. For example, we may be able to deduce that a die is reasonably fair from the way it is constructed and our knowledge of physics, and later confirm this by frequentist analysis. Merely believing or claiming that the die is fair is not acceptable. Again, the difference is only one of attitude in the end, I suppose.

Maybe philosophers have given Bayesian statistics a bad rap, too, because many of those who call themselves Bayesians are also "probabilists", i.e., they think that rational belief must conform to the probability calculus. There are many arguments against probabilism and the only arguments that speak for it are Dutch book arguments. The view does not have very strong foundations.

> assigning probabilities to singular events is only meaningful and admissible at all if there is a good analytic explanation for the respective propensity.

Wait a minute, you are making a type error here: probabilities are not propensities. They're degrees of belief. (And even if you disagree in general, this is a Bayesian context you're talking about.)

If I put a die on a table and hide it with a cup, you could still estimate your probability distribution about which face is up. My probability distribution would obviously be very different, since I put the die in there myself. (Replace "probability" by "betting ratio" or "degrees of belief" if it makes more sense to you.)

> The [probabilism] view does not have very strong foundations.

Read the first 2 chapters of Probability Theory: the Logic of Science, by E. T. Jaynes: "Plausible reasoning" and "The quantitative rules". It's very accessible, and you shall see how strong the foundations really are.

http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...

No, I was not speaking from a Bayesian perspective, I was laying out the propensity-theoretic explanation of probability. The propensity explanation is one of attempts of explaining why singular events might be said to give rise to probabilities, living besides frequentism and Bayesianism. Another perspective worth mentioning is the logical approach, which is in the end purely combinatorial.

Some people think that you need to explain why a die can be fair, rather than just assuming it or only looking at it from a frequentist perspective. Of course, die-hard Bayesians don't think so, but that would be begging the question in the context of discussing criticisms of Bayesianism.

> Read the first 2 chapters of Probability Theory: the Logic of Science, by E. T. Jaynes: "Plausible reasoning" and "The quantitative rules". It's very accessible, and you shall see how strong the foundations really are.

I'm an expert on this topic. The only arguments for probabilism are Dutch book arguments, and there is a large number of arguments against these. See for example various articles by Hajek. Alternative representations of graded belief are, among others:

- plausibility theory (Halpern at al.)

- possibility theory (Dubois & Prade)

- Haas-Spohn ranking theory and variants thereof

- various notions of epistemic entrenchment

- Dempster-Shafer belief theory

- almost any quantitative or qualitative representation of belief in belief revision theory not covered by one of the above theories (e.g. belief update by Katsuno & Mendelsohn)

- by a general logical connection, nonmonotonic logics and AAFs can generally represent notions of belief update, such that the underlying qualitative ordering of states is a representation of graded belief

What you probably mean is that the above generalizations (or qualitative theories, in some cases) could be simulated with probabilities, e.g. by using convex sets of probabilities or what Josang is doing in his "subjective logic". That's true, but then we're no longer talking about probabilism in the sense I've used the word.

Of course, you can also try arguing for probabilism like Savage did: Lay out a set of postulates for your subjective plausibility that happen to allow you to proof that this notion of subjective plausibility is in the end probability. Despite the merits of such work, it is in the end a form of cheating (or "reverse engineering"), because you could just as well come up with plausible postulates that yield the weaker axioms of possibility theory.

> No, I was not speaking from a Bayesian perspective, I was laying out the propensity-theoretic explanation of probability.

Unless you can explain this "propensity" in terms of actual physical properties, propensity by itself is… unjustified. The only domain I know of so far where we could possibly argue propensities are a thing is quantum mechanics. And even then it seems to rest on an anthropic argument: which universe am I living in?

> Some people think that you need to explain why a die can be fair,

A die by itself is not fair, right? A die might be balanced, and the way it is thrown it might have enough unpredictable variability to cause everyone in the room to think "uniform distribution over [1..6]".

Likewise, a cryptographic pseudo random generator is unpredictable (and thus "fair"), to anyone who doesn't know its internal state. Even though the process itself is deterministic, it's just not computationally feasible to guess its output just from the observation of past inputs. (Though for this one I'm relying on the fact we're not logically omniscient.)

> I'm an expert on this topic.

Good. Then you know that any inference strategy that falls prey to Dutch Books is not rational. Right?

To be fair, probability theory is not computationally tractable. I did not verify, but I guess any feasible approximation is vulnerable to some more or less subtle Dutch Books.

Now the way you talk about Dutch Books sound like all the other strategies you mention are vulnerable, not just in practice, but in theory as well. They are thus not perfectly rational. Do their authors at least have the grace to admit this is a flaw that should be corrected?

But then I suspect that correcting the flaw inevitably leads to probability theory itself: if you accept Jaynes three "desiderata" as required for any kind of rational reasoning, as he shows, the result is necessarily equivalent to probability theory as we know it (where probabilities are subjective assessments of plausibility, otherwise known as "degrees of belief").

I can only conclude that you do not accept Jayne's desiderata as necessary for correct inference. And this is the point where I look at you like you're not quite sane.

For reference, Jaynes Desiderata:

  (1) Degrees of plausibility are represented by real
      numbers. (And a continuity assumption.)

  (2) Qualitative correspondence with common sense.
      (explained in more detailed in the book)

  (3a) If a conclusion can be reasoned out in more than
       one way, then every possible way must lead to the
       same result.

  (3b) The robot always takes into account all of the
       evidence it has relevant to a question. It does
       not arbitrarily ignore some of the information,
       basing  its conclusions only on what remains. In
       other words, the robot is completely non
       ideological.

  (3c) The robot always represents equivalent states of
       knowledge by equivalent plausibility assignments.
       That is, if in two problems the robot’s state of
       knowledge is the same (except perhaps for the
       labeling of the propositions), then it must assign
       the same plausibilities in both.
Good luck convincing me (and I suspect, the majority of people, including frequentist statisticians), that we should reject any of these desiderata.

I don't care it's reverse engineering, those desiderata match the way I think. I accept the conclusion that probability theory is the correct (albeit intractable) way to think, because I ultimately agree with the postulates it rests on. Vehemently so. They're not just true, they're obvious.

If you don't accept them, then I can only give up, and remember what Yudkowsky once wrote: "How do you argue a rock into becoming a mind?"

My understanding of physics is that no die toss can be considered "fair" because such a macroscopic system behaves deterministically according to Newton's laws, and isn't even too chaotic to model accurately. No matter the shape or balance of the die, the outcome is determined by the initial conditions and the toss. A skilled gambler can make a fair die land however they want.

The only thing I know is that a well-made die is symmetrical, and so if I have no prior knowledge of its initial orientation then I have to use a uniform prior because nothing else has the requisite symmetry group.

The same could be said for a die that is just sitting on the table without having been observed by me yet, no toss needed.

> A skilled gambler can make a fair die land however they want.

No, they can't. Dice control is a myth, and there isn't a single study that backs it up.

> The idea that priors are somehow ruining an "objective" model is just absurd to me.

I think some caution can be justified to a certain extent (not the blind "emotional" objections). When establishing priors in a low data regime, one must necessarily be careful. It's a knob whose mass can change a lot in the inference conclusion. That said, if we trust our belief about the region the available data do not inform us well of, why not utilize our domain knowledge/belief?

I think the swedish fish approach is a particularly fun way: https://www.youtube.com/watch?v=3OJEae7Qb_o
I love the idea of making an approachable version of Ed Jaynes’s classic.
"For coin tosses both schools of thought work pretty well"

How many coin tosses in a row have to land heads before a frequentist decides that the coin is unfair?

6, if it's a two-sided test.
Explain?
I'm guessing 1 flip to pick Heads or Tails, and then 5 flips to get a 'good' p-value (2^-5 = 0.03125 < 0.05)
>First of all, p-values are not the way sane people answer questions

I think they are pretty close to the way sane people answer some kinds of questions.

Can anyone recommend a 'Bayesian statistics the hard way' book?
For the hard way, look at Bruno de Finetti's Theory of Probability:

https://onlinelibrary.wiley.com/doi/book/10.1002/97811192863...

Jaynes is certainly very deep and some sections are harder than others. It's interesting regardless of your level (this is a book worth rereading several times).

For a less technical, but full of insight, introduction see Dennis Lindley's Understanding Uncertainty:

https://onlinelibrary.wiley.com/doi/book/10.1002/97811186501...

Bayesian Data Analysis by Andrew Gelman

http://www.stat.columbia.edu/~gelman/book/

one vote for BDA. For programmers who learn better by implementing things, this book [1] is also good:

[1]: https://www.amazon.com/Bayesian-Methods-Hackers-Probabilisti...

Parts of that book are available online[1] for free. If not for that book I would never have understood how to apply Bayesian stats to problems that interested me.

[1] http://camdavidsonpilon.github.io/Probabilistic-Programming-...

Probability Theory: The Logic of Science by Edwin Jaynes
http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...

But really, the first two chapters aren't that hard.

Statistical Rethinking: A Bayesian Course with Examples in R and Stan is also considered pretty good.
Thank you all!
I got in an argument with a friend (a mechanical/electrical engineer) who knew about bayesian statistics. My other friend, a PhD in statistics, whom I had many discussions about because both personal interest and work interests, had supplied me with my modicum of statics knowledge.

My engineer friend called my PhD friend a "frequentist", like it was a dirty word, despite only having one, maybe two, classes in college about bayesian math/statistics/whatever (my ignorance).

This quote jumped out at me in the article:

"I wanted to write a book on Bayesian statistics that really anyone could pick up and use to gain real intuitions for how to think statistically and solve real problems using statistics."

In the context of the statement, it sounds like he is claimin any non-bayesian statistics is useless (or less valuable/reliable at best) than other forms of statistical analysis?

Having known Will when I lived in Reno I'm certain your focus should be on "anyone could pick up and use" and not any statement about the usefulness of other approaches. The Will I know is fundamentally about teaching things in very easy to understand ways, and curious about all approaches to solving a problem.
It just reads to me like he wants to make statistics accessible to a wide audience.
That's not how I'm reading that quote at all. Saying Bayesian stats can solve real problems doesn't imply frequentist stats can't.