It's not completely inscrutable in why it works, follow the thread on deep learning theory research by starting from the names here: http://www.vision.jhu.edu/tutorials/CVPR17-Tutorial-Math-Dee...