Hacker News new | ask | show | jobs
by reader5000 3940 days ago
Im not advocating GP > nn+grad. I'm just saying nobody knows why nn+grad performs better on practical problems. Also nn+grad methods were essentially complete failures for the first 3-4 decades of their existence (invented in 60s, legit results like Lecun or Hinton in like 90s/00s).
1 comments

This is so wrong at so many levels (sadly quite characteristic of the GA crowd) that I can only suggest reading up on the background and the math of it. Good keywords would be stochastic gradient descent, spin glasses and renormalization groups.
Whether nns have an interpretation as spin systems is completely nonresponsive to the hardness of training them. Ie minimizing "free energy" in spin systems is as NP-hard as minimizing training error in nns.