Hacker News new | ask | show | jobs
“Everything that works works because it's Bayesian” Why Deep Nets Generalize? (inference.vc)
40 points by fhuszar 3318 days ago
2 comments

If DNN is just a crappy approximation of some kind of Bayesian inference, then where are the better approximations that beat it on all the metrics we care about? And if that magical thing does exist, why aren't people using it to beat the pants off the DNN people and take their lunch money?
We need to differentiate between neural networks, which do not have robust theoretical underpinnings, and practical considerations.

DNN is fantastic from a computational standpoint. Its _GEMM all the way down. You get high flop counts and with modern techniques, gradient-based methods find optimums relatively reliably.

But from a theoretical standpoint there are major question marks. Why does dropout work? Why has SGD been so successful? To make the field more rigorous these need to be pounded out. And in the course of it, this will make DNNs more powerful, more generalizable (as Ferenc noted), and more useful. I'll also add that it might help us discover fundamental laws of intelligence.

As evidence of this approach being useful, I'll note that Yann LeCun is openly Bayesian.

The point is that you can view things as a (approximate) form of bayesian inference, in order to think differently about how they work, and what they're doing.

Another common example is ridge regression being expressed as a bayesian regression (where the "ridge" part is due to the prior).

I'm surprised about the NN that memorize the data. I'd imagined there would not be enough units to memorize everything.

But if we have a network that has essentially memorized a random dataset, how is it functionally different from a nearest neighbor algorithm?

And there we have the million dollar question that people are still struggling to answer: why do neural nets work, when existing theory says they should not?