| I've often brought up GPUs while talking to folks, because they're interesting and offer a world of potential. I have two computers, one with a Ryzen 1950X, and the other an i9 7900X. Both CPUs cost about the same, but the i9 (with avx-512) is close to 4 times faster at matrix multiplication. Yet it is still about 10x slower than a cheaper Vega 64 GPU. But the folks I talk to aren't generally computer scientists. They're statisticians and academics, mostly. A few have tried, but they haven't been successful. There are libraries like rocRAND / cuRAND for random number generators. It's probably possible, and I just need to sit down and really experiment. For the MCMC chains (going on within MC), Hamiltonian Monte Carlo sounds more feasible than Gibbs sampling. In Gibbs sampling, you need lots of different conditional random numbers. You often get these from accept/reject algorithms -- ie, lots of fine grained control flow.
And ideally, each MCMC run has at least an entire work group dedicated to it. You don't want the entire work group calculating a small handful of gamma random number (with all the rest masked). The parameters of the gammas are not known in advance, so they cannot be pre-sampled. Hamiltonian Monte Carlo is probably much friendly. However, I have heard concerns that the simplectic integrator used needs a high degree of accuracy to avoid diverging. That is, that it needs 64 bits of precision. GPUs with more than 32 bits are well outside of my budget. Although, I could look into tricks like double-singles for the accuracy-critical parts of the computation. The simulation I mentioned in my previous comment was using Hamiltonian Monte Carlo. However, each iteration was rather involved, and while much is vectorizable (eg, matrix factorizations and inversions), doing so on a GPU is AFAIK not trivial.
It seems like a gigantic leap in complexity. |