I don't think it has to be either or. For example, you can implement a Deep Net in a probabilistic programming framework (http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learn...) like PyMC3. The inference here is not using sampling but rather ADVI (http://pymc-devs.github.io/pymc3/api.html#advi) which is almost as general but much much faster, and can be run on sub-samples of the data (similar to stochastic gradient descent used in deep learning). Once we bridge these two domains we can get the best of both worlds, like a deep net HMM.