Hacker News new | ask | show | jobs
by ailun 1212 days ago
To be clear, I don't disagree with you about MCMC. But wouldn't your first two pitfalls wouldn't be taken care of as much as possible by using a widely used library's implementation? All the libraries I'm aware of tune the HMC hyperparameters automatically - should I be more worried that there's something invisibly going wrong there? And your third pitfall applies to MCMC in general, not HMC in particular.
2 comments

> should I be more worried that there's something invisibly going wrong there?

I think what anyone should be cautious about is thinking that these more sophisticated methods can allow MCMC methods to significantly scale up to larger problems. But they are probably 2-10x at best, if it's well suited to your distribution (however you characterize that). Often it's not enough to keep up with the dimensionality of the models people want to use. I remember struggling to get this to work on a Bayesian neural network that was maybe 100 parameters, tiny by todays' standards.

As for approximating hyper-parameters, again if it's an easy problem, sure, probably something will work. But I do have to say I find this really fascinating when you start to get into these harder problems with MCMC algorithms. The first thing that comes up is how do you know that a sampler is drawing samples from the correct distribution? how exactly do you quantify that? Usually each sampler takes the same amount of time to produce a sample, so what you're really comparing is which one is giving "better" samples. But.. they are all supposed to be drawing from the same distribution, so how can one sample be better than another. If they are not drawing from the same distribution, then how can you possibly know which one is the right one!

So I'm simplifying the problem here, and not to disparage any of the fine work on autocorrelation and mixing rates and all that, but you have to admire just how impossible this is generally.

Very interesting, thank you! I don't think I've worked on any truly large problems with these methods so I will keep your thoughts in mind when I do.
> should I be more worried that there's something invisibly going wrong there?

I have no experience with those tuners, but, absolutely. There's no way that some heuristics can generalise to every possible distribution you feed in. But if the distribution is "sufficiently" nice (which is your responsibility to ensure) then presumably they will work.

That's why I said "more worried," though. Of course you can't be sure that they are doing a perfect job for every situation. But I don't think I'm going to do a better job manually than those heuristics - that's why they mostly aren't exposed to the user in the usual workflows, because people who spend their lives researching best practices for these algorithms don't think it's a good idea most of the time. If those heuristics aren't working, you're better off reparameterizing your model rather than messing with them. Hence why I don't think they are worth worrying about too much.