Statistically-random excellent performance in complex tasks is very unlikely. More likely is that examples are in-sample from a small training set or very similar. Big NNs can memorize anything.
They can memorize any supervised learning task, but so far, we haven't been able to see any deep generative model successfully memorize something more complex than MNIST.