The universal approximation theorem does not apply once you include any realistic training algorithms / stochastic gradient descent. There isn't a learnability guarantee.
You said it only depends on network size, I'm saying it more likely is impossible regardless of network size due to fundamental limits in training methods.