Hacker News new | ask | show | jobs
by antognini 2056 days ago
The depth seems to matter more than the width, at least as long as the layers are sufficiently wide. In fact, in the limit that the layer becomes infinitely wide, you just end up with a Gaussian process. In practice a width of ~100--1000 is sufficient to get behavior that is pretty close to a Gaussian process, so in general doubling the width of a layer doesn't gain you all that much compared to using those parameters for an extra layer. The real representational power seems to come from increasing depth.