Oh there's plenty of good explantation in the neural network literature (my eli5: the skip connections make the default mapping an identity instead of a zero mapping; you can start by doing no harm, and improve from there). The method was suggested by knowledge from differential equations. All I'm saying is that the "everything is secretly an svm" viewpoint is probably too coarse to explain these interesting and effective structural differences.