|
|
|
|
|
by sdenton4
2026 days ago
|
|
Agree, but also think the result may be too general to be useful. Proving that you can rewrite any network learned with gradient descent this way kinda suggests that the architecture doesn't matter, but we know that's not true. Eg, why are networks with skip connections SO much better than networks without? What about batch normalization? This makes me suspicious that it's a nice theoretical result a bit too high level to be useful. Yes, it was proved years ago that you can train an arbitrary function with a wide enough two-layer net, but it's not a terribly practical way to approach the world. Now we have architectures much better than two-layer networks, and, for that matter, SVMs. There's a number of problems with svms; complexity for training and inference scales with the amount of training data, which is pretty sad panda for complex problems. Extremely spicy/cynical take: it's not cool to say "you all should go look at all these possible applications" when the thrust is the paper is to prop up the relevance of an obsolete approach. You gotta do the actual work to close the gap if you still want your PhD to be worth something... That said, I haven't read the paper terribly closely, and am always happy to be proven wrong! |
|