Hacker News new | ask | show | jobs
by jackson1372 2384 days ago
The reason you want to over-parameterize your model is that it protects you from "bad bounce" learning trajectories. You effectively spread out your overfitting risk until it's pretty close to 0.

Or at least that's the way I like to think of it.

The next step is to better compress the resulting model in a simpler, less computationally costly network.

1 comments

Are you suggesting dd is about local minima sort of? Like if you extended the risk: parametrization curve out you'd start to see overfitting again?