Hacker News new | ask | show | jobs
by currymj 3093 days ago
it seems like it's made by analogy with "probabilistic programming", i.e. defining complex probability distributions by writing familiar-looking imperative code (with loops and whatnot).

I think the idea is that thinking in terms of passing data through layers in a graph is cumbersome sometimes, and that expressing it as a "regular" program that just happens to come with gradients could be more comfortable.

I'd argue that GANs in particular are a natural fit for this style. The training procedure doesn't really fit exactly into the standard "minimize loss function of many layers using backprop".