Sane & interesting enough to have been disproven, by Boaz Barak iirc. Maybe not surprising since simulated annealing never achieved the results of gradient descent + backprop.
What makes statistical mechanics so brilliant is that it takes first principle ideas (particle energies + ensemble) to derive macroscopic thermodynamic rules, all of which were originally derived from observation.
What the OP is proposing is a mathematical analysis of SGD + generic deep learning architectures might be able to derive the rules we have empirically derived from experiments in model training.
What makes statistical mechanics so brilliant is that it takes first principle ideas (particle energies + ensemble) to derive macroscopic thermodynamic rules, all of which were originally derived from observation.
What the OP is proposing is a mathematical analysis of SGD + generic deep learning architectures might be able to derive the rules we have empirically derived from experiments in model training.