|
|
|
|
|
by cfgauss2718
820 days ago
|
|
I haven’t read the manuscript yet, and am not sure that I will. However I don’t agree with the question. Gradient descent, the properties of the loss function are the “how”. It seems like you want to know how some properties of the data are manifested in the network itself during/after training (what these properties are doesn’t seem to be something that people know they are looking for). Maybe that’s what the authors are interested in as well. If I could bet money in Vegas on the answer to that question, my bet would be in most cases that structures we may probe in the network and see in them correlations to aspects of the problem or task that we (as humans) can recognize, well very likely this will boil down to approximations of fundamental and eminently useful quantities like, say, approximate singular value decompositions of regions in the data manifold, or approximate eigenfunctions etc. I could see how these kind of empirical investigations are interesting, but what would their impact be? Another guess, that these investigations may lead to insights that help engineers design better architectures or incrementally improve training methods. But I think that’s about it - this type of research strikes me as engineering and application. |
|
Pretty much everything about NNs is engineering - it's basically an empirical technology, not one that we have much theoretical understanding of outside of the very basics.