I wonder if it is because using backpropagation all non-linear functions are chained together when the weights are learnt? Is it naive to think by that formulation the results will be quite similar since the final model equations are close?
Excellent write up. I've coincidentally been experimenting with the same thing. Any idea whether this approach could be used to speed up a search for exact solutions?