No they all use the same general principal of backpropagation to do the training. Different flavours of optimizers exist with different tweaks and additions to speed training up.
So it's not common to use a layer-by-layer training approach for deep nets? I thought that was one of the main things that made a huge difference and enabled the "deep" revolution. Anyways, isn't vanishing gradients still a problem? If so, how do people use these frameworks for deep nets? Otherwise, how is the problem resolved? I thought vanishing gradients was an issue for anything with more than 2 or 3 layers.