Hacker News new | ask | show | jobs
by vismwasm 907 days ago
I never really got Bayesian statistics to be honest.

- When sample size grows, frequentist and bayesian (if the prior is not too restrictive) point estimates seem to converge to each other anyway

- The distribution of your point estimate (frequentist) vs. the estimated distribution (bayesian) also don't seem to differ too much either

- When the sample size is small the Bayesian prior dominates

- Interestingly, when I see Bayesians simulate random data (to introduce the concepts on this data) they usually assume a true parameter value. E.g. when sampling from Y = a + b * X + e, they'll assume fixed, true values of a and b and not random variables - which is a frequentist assumption! So far I've never seen e.g. b being sampled from Normal(mu=2, sigma=1) instead of just setting b=2.

- The frequentist assumption of a true population value which we try to estimate just makes sense to me. For example there is a true mean income over the working population. It's not a random variable but a fixed value which can be computed if we just asked every single working person for their income and then compute the mean over all values.

I tried getting into Bayesian stats but honestly it just seems overkill for most cases. For a simple regression computing b_hat = inv(XX')Y is just faster and easier than numerically sampling traces. Bayesian forces you to think about the data generating process - I appreciate that, but you need to the same when it comes to frequentist stats, it's just a little less obvious.

6 comments

> When sample size grows, frequentist and bayesian [...] estimates seem to converge to each other anyway

Yes. And so? Bayesians would argue (and I quote) that "the interesting limit in statistics is when the number of samples tends to one. The limit when the number of samples tends to infinity is completely useless."

> I tried getting into Bayesian stats but honestly it just seems overkill for most cases.

There are 3 black balls and 7 white balls in an opaque bag. How likely is it to pick a black ball? Bayesian statistics gives a straightforward answer (you just assume an uninformative prior and perform a computation). But frequentist statistics starts to argue about an infinite number of replicas of your own universe and other nonsensical constructions. Not sure that the Bayesian approach is overkill in that case...

> Yes. And so? Bayesians would argue (and I quote) that "the interesting limit in statistics is when the number of samples tends to one. The limit when the number of samples tends to infinity is completely useless."

The "and so?" is answered right after that. The prior dominates, which is a bad thing.

As the amount of data tends to 0 (idk why the quote is using 1), if course your belief tends to whatever your belief was before you saw any data. What else could it possibly tend to? Of course it's very sad that we don't have any data, but that's no fault of Bayesian.
> As the amount of data tends to 0 (idk why the quote is using 1)

The smallest amount of samples you can use is 1, isn't it? If you have 0 samples then you do nothing because you have no data. Is there a way to have half a sample?

> if course your belief tends to whatever your belief was before you saw any data

Your beliefs should tend to that, sure, but if you're trying to produce an actual number for sharing then your beliefs shouldn't be a huge factor, and an uninformative prior being a huge factor is also bad.

For numbers that leave my head/notebook, I'd rather keep the new evidence by itself and say it's weak.

Does Bayesian have a concept for absence of belief? I don't feel like believing anything is equally likely is equivalent to absence of belief. But maybe it is?
There is a concept of minimum knowledge (maximum entropy). There is a concept of invariance (like translation invariance where you have no reason to prefer one position to another because the origin could be anywhere - or scale invariance where the value of a magnitude could be high or low if you don't know anything about the unit of measurement).

I'm not sure if by "absence of belief" you mean "ignorance" or something else.

I think about something like known ignorance. I know that I don't know anything about this thus I refuse to have any belief about what it might be as a I know any belief would be unwarranted.
Why do you think it's a bad thing for your beliefs to remain the same in the absence of new data?
Should you have any beliefs in the absence of data? And if you have some prior data but no additional data now why carve out past as separate thing and call it prior? Why not just call everything you have - data?
That's precisely how Bayesian inference works! But rather than having to repeat all analysis of prior data sets, we summarize that analysis in the form of a posterior, which becomes the prior for the next analysis.
I didn't say anything about what should happen to your beliefs.
You said that the prior dominating is a bad thing. The prior is your beliefs about a parameter prior to observing data (as I suspect you know!). Maybe I'm not getting what you're saying.
I was assuming these are general purpose statistics, which means you might want to share them with someone. It's bad for those to get tainted by your personal priors. If it's a purely personal calculation then sure that's fine.
Bayesian statistics, the way Andrew Gelman practices it, comes naturally when you are interested in generative models of data. You can still use maximum likelihood estimates, but these become fragile when you have hierarchical / multilevel models.

Multilevel models are fantastic to address a problem that is often ignored by frequentist approaches, the need for shrinkage and information sharing. This pops up all the time in modern statistics. For example, if you test 1000 hypotheses, calculating p-values and adjusting these with some multiplicity correction scheme is not sufficient.

You should borrow information across random variables with a multilevel model to avoid estimating exaggerated effects in tests whose outcome is deemed to be significant. Andrew Gelman's post is concerned with this topic.

Another point is that Gelman et al. use weakly informative hyperpriors. These are not really subjective. If anything, they usually regularize solutions by pushing effects towards zero. Plus, on multilevel models, priors are only needed on hyperparameters.

I use mixed level models for longitudinal analysis pretty regularly. There the point has been to account for correlated dependent observations (e.g. repeated variables within a participant.

However it seems that you are suggesting another use. If I have 10 cognitive measures each measured once in my subjectd, the default has been to do a multiple comparison correction, either FDR or FWER on 10 tests. We know that the 10 tests are not truly independent, so Bonferroni is probably too conservative.

It seems here you suggest running this with test being a random effect. I've seen this approach with item level data in a task, but I didn't really think to do it when the tests are not from the same battery, construct. And more to the point, this fixed effect model would be of no particular interest, while random effect CIs are difficult to estimate. So I am left a bit confused.

I think the attraction of Bayesianism is kind of philosophical / aesethetical, it is is principled and sound and beautiful approach. It's kinda nice that it kinda extends and translates occams razors into numbers.

Yes frequentist statistics work very well in practice, but it's a bit adhoc and suffers from various problems like say if you estimate velocity and estimate kinetic energy, you get values that are incompatible which is kinda ugly and non-intuitive and makes you want to dig deeper into how such a thing happened.

Bayesianism has the answers.

Also sometimes it really does matter like in medicine, where some conditions have a very low prior probability.

> but it's a bit adhoc

That's how many people feel about Bayesian methods when trying to pick an initial prior.

I mean that's what's so good about them. The prior is a bit arbitrary and up for discussion. And Bayesianism is honest about it, it highlights that fact rather than trying to play fast and loose and sweep the issue under the rug.
> The distribution of your point estimate (frequentist) vs. the estimated distribution (bayesian)

Ideally one should use the whole posterior distribution of your model parameters which is not the case for point estimates.

>So far I've never seen

Because people are lazy.

Bayesian works great if you have great knowledge in your field and you can fine tune everything. Frequentist stats just works and easily interpretable but easy to make mistakes esp. when starting out.

> Ideally one should use the whole posterior distribution of your model parameters which is not the case for point estimates.

This is a historical issue because of some hard-headed frequentist founders, but in modern days the frequentist concept of confidence distribution is gaining acceptance, which is the proper frequentist equivalent of the posterior, so this distinction between Bayesian and Frequentist is disappearing.

Rather than giving specific point estimates or interval estimates, calculating a frequentist confidence distribution allows you to compute confidence intervals for all possible confidence levels, just like the posterior does. See this excellent review paper for more info on this: https://statweb.rutgers.edu/mxie/RCPapers/insr.12000.pdf

The key insights is that a confidence distribution is an estimator for the parameter of interest, instead of an inherent distribution of the parameter.

The confidence distribution is generally derived from normalizing a likelihood function, and the likelihood function is arguably the proper underlying concept that provides a link to both Bayesian and frequentist inference, per https://en.wikipedia.org/wiki/Likelihood_principle
> the frequentist concept of confidence distribution is gaining acceptance, which is the proper frequentist equivalent of the posterior, so this distinction between Bayesian and Frequentist is disappearing

The major distinction remains: Frequentist confidence intervals are something quite different from Bayesian credible intervals. I don't think that having a distribution that can be used to calculate any desired confidence interval - like the posterior distribution can be used to calculate different credible intervals - changes much.

The Bayesian formula is dreadfully useful in machine learning, in the modeling of generative problems. However because integration (in calculus) is computationally intractable, we usually have to use approximations instead of true bayesian stats.
> Interestingly, when I see Bayesians simulate random data (to introduce the concepts on this data) they usually assume a true parameter value. E.g. when sampling from Y = a + b * X + e, they'll assume fixed, true values of a and b and not random variables - which is a frequentist assumption! So far I've never seen e.g. b being sampled from Normal(mu=2, sigma=1) instead of just setting b=2.

The Bayesian philosophy of "random parameters" does not mean that Bayesian methods cannot be assessed for frequentist properties or compared against frequentist procedures.