| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by standevbob 2006 days ago

Stan supports optimization (L-BFGS) to find (penalized) maximum likelihood or MAP estimates where they exist. Bayesian estimates are typically posterior means, which involve MCMC rather than optimization, and the result is usually far away from the maximum likelihood estimate in high dimensions. I wrote a case study with some simple examples here: https://mc-stan.org/users/documentation/case-studies/curse-d...

Adding new parameters scales as O(N^5/4) in HMC, whereas it scales as O(N^2) in Metropolis or Gibbs. It's quadrature that scales exponentially in dimension. There's also a constant factor for posterior correlation, which can get nasty. I regularly fit regressions for epidemiology or genomics or education with 10s or even 100s of thousands of parameters on my notebook with one core and no GPU.

MCMC or optimization can be sub-linear or super-linear in the data, depending on the statistical properties of the posterior. Some non-parametric models like Gaussian processes can be cubic in the data size, whereas regressions are often sub-linear (doubling the data doesn't double computation time) because posteriors are better behaved (more normal in the Gaussian sense) when there's more data and hence easier to explore in fewer log density and gradient evaluations.