|
|
|
|
|
by iflp
2104 days ago
|
|
It depends on what kind of understanding you want to achieve. It can be helpful to think of DNN as approximating the corresponding infinitely-wide versions. Depending on how you deal with certain scaling, they then act like a linear filter of the error signal in function space, or for single-hidden-layer networks at least, an interacting particle system. In both cases you can understand the convergence of gradient descent training using these analogies, although gaps from real-world practice exist. |
|