Hacker News new | ask | show | jobs
by azag0 3113 days ago
I think this recent paper [1] sheds quite a bit of light on this.

[1] https://arxiv.org/abs/1703.00810v3

1 comments

Really don't think that's the best paper to say "sheds quite a bit of light on this". That paper has been somewhat controversial since it came out.

I think https://arxiv.org/abs/1609.04836 is seminal in showing unsharp minima = generalization, the parent's paper is good for showing that gradient descent over non-convex surfaces works fine, https://arxiv.org/abs/1611.03530 is landmark for kicking off this whole generalization business (mainly shows that traditional models of generalization, namely VC dimension and ideas of "capacity" don't make sense for neural nets).