| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by iflp 2104 days ago
	It depends on what kind of understanding you want to achieve. It can be helpful to think of DNN as approximating the corresponding infinitely-wide versions. Depending on how you deal with certain scaling, they then act like a linear filter of the error signal in function space, or for single-hidden-layer networks at least, an interacting particle system. In both cases you can understand the convergence of gradient descent training using these analogies, although gaps from real-world practice exist.