|
|
|
|
|
by mattb314
1971 days ago
|
|
I think I generally agree with the majority of the comments here that burn in can serve a useful purpose (especially if you can't find a high probability density point to start from), but I also wonder: if burn-in vs no burn-in makes a large difference in your outcome, aren't you likely just not running your chain long enough? Sure, if you choose a bad starting point, your initial samples might not be representative of the overall distribution, but if a handful of non-representative points can massively impact your result, then I'm not sure how stable your result was to begin with (how do you know there isn't some other set of low-probability high-impact points that your sampler just missed through luck?). People tend to have a cognitive bais towards distributions looking pretty (eg not having random chains off to the side as in the article), but I'm not sure it makes a real difference. That said, I do think burn in is a pretty reasonable way to find a good starting point if you don't have existing knowledge about the distribution. From a practical standpoint, has anyone actually seen a massive difference between runs with/without burn in? kinda curious how often it really matters |
|
You're right, and most comments I've seen over the years on the post conveniently miss that he addresses that:
> This unbiasedness argument is rubbish. If you start at x and I start at x then your MCMC run is no better than mine. If you used burn-in and I didn't, then you are entitled to woof about approximate unbiasedness and I am not. But that woof does not make your estimator any better.
My interpretation has always been this, and I think it's correct: You need a good starting point. There's no reason to think burn-in gives you a good starting point. Instead, use something that's actually intended to give a good starting point, like the mode.