Hacker News new | ask | show | jobs
by usgroup 758 days ago
Does that imply that every possible branch is optimised separately and then convoluted thereafter?

If so, does it scale for very branchy programs?

Do you have any comparisons to a Gibbs based approach for any of the use case examples?

1 comments

The convolution is approximated via a form of sampling with additional bookkeeping at each encountered branch. How well that scales for deeply branching programs depends on the probabilities of the branches and the diversity in the output on the resulting paths, the worst case being a program where all branches are equally likely and each path generates an entirely different output (as in a hash function, for example). In practice, we've been dealing with problems involving up to tens of thousands of branches or so.

We've haven't done a direct comparison to MCMC approaches yet, but it's on the Todo list. My intuition is that MCMC will win out for problems where finding "just any" local optimum is not good enough.

Thank you. Your helpfulness and balanced evaluations are refreshing.