|
|
|
|
|
by stellographer
4029 days ago
|
|
Interesting choice... Seems like genetic algorithms, particle swarms etc. would be more attractive choices since they solve the same problem and
are inherently parallelizable while Metropolis Hastings is almost 100 years old and designed for a 4 function calculator. Although I guess some people have meta-parallelized it... but still seems like a patch job compared to modern likelihood navigation algos. |
|
The point of Metropolis-Hastings is to sample from a distribution when you do not know the partition function. It is the most important building blocks in a set of algorithms broadly known as Markov Chain Monte Carlo. These algorithms are particularly useful when performing Bayesian statistics.
Genetic algorithms will not give you samples from a distribution, they only perform optimization. Particle swarms also focus on optimization, and on top of that, they do not seem to have either theoretical justification or empirical success.
MH is embarrassingly parallel since you run multiple chains at the same time. Again, the point isn't optimization (that would be simulated annealing) but sampling.
Being 100 years old is also largely irrelevant. People will publish new algorithms to get publications all the time. That do not mean they necessarily outperform the old ones. Gradient descent is the basic algorithm used in training all of these cool new deep learning algorithms, and it's much older than MH.
Yes, there are more recent improvements to MH, the two biggest ones being Hamiltonian Monte Carlo (which uses gradient information) and Parallel Tempering (which is somewhat similar to homotopy optimization), but that's hardly a reason to dismiss the importance of this algorithm.