It's a fair point that the Universal Approximation Theorem does not guarantee that the weights can be learned. OTOH, the physical laws that the article states a neural network cannot discover are computable functions.
You need a stronger bound than this. They have to be possible to approximate govern specific network size, architecture and activation functions. Calculating that (or good statistics that will say so approximately) is a hard problem...
It is solvable for a bunch of activations in a layered perceptron but attempt extending this to something more complex.