|
|
|
|
|
by nvrmnd
1211 days ago
|
|
> should I be more worried that there's something invisibly going wrong there? I think what anyone should be cautious about is thinking that these more sophisticated methods can allow MCMC methods to significantly scale up to larger problems. But they are probably 2-10x at best, if it's well suited to your distribution (however you characterize that). Often it's not enough to keep up with the dimensionality of the models people want to use. I remember struggling to get this to work on a Bayesian neural network that was maybe 100 parameters, tiny by todays' standards. As for approximating hyper-parameters, again if it's an easy problem, sure, probably something will work. But I do have to say I find this really fascinating when you start to get into these harder problems with MCMC algorithms. The first thing that comes up is how do you know that a sampler is drawing samples from the correct distribution? how exactly do you quantify that? Usually each sampler takes the same amount of time to produce a sample, so what you're really comparing is which one is giving "better" samples. But.. they are all supposed to be drawing from the same distribution, so how can one sample be better than another. If they are not drawing from the same distribution, then how can you possibly know which one is the right one! So I'm simplifying the problem here, and not to disparage any of the fine work on autocorrelation and mixing rates and all that, but you have to admire just how impossible this is generally. |
|