Hacker News new | ask | show | jobs
by CuriouslyC 3258 days ago
If you don't have a gradient, one tactic is to make the most of the situation. Give your model the Bayesian treatment, and sample from the posterior using MCMC. This is slow, but you end up with posteriors on your parameter values, which is a huge win.
1 comments

Yeah, I've been a big fan of probabilistic programming for a while. The real problem is that getting Monte Carlo methods to converge and produce a large sample from the posterior takes orders of magnitude more time than running an optimizer to descend a gradient. Hey, you can even make it a probabilistic gradient: variational inference! But then you still have a hard time with discrete, nondifferentiable structure.