Minor correction: Deep learning using gradient descent is incredibly robust to noise. If you know the mathematics, this also makes sense intuitively: gradients of incorrect labels will generally point in random directions, whereas the "truth" points in a specific direction (and I explicitly mean truth in the sense of what is portrayed consistently as fact in the dataset, not the real world truth). So when you accumulate gradients, you will end up with a net effect that moves weights only towards the consistent answers.
Since gradient descent is by far the most popular algorithm, it's easy to conflate these two things. But there are other approaches that don't treat noise so well.
Since gradient descent is by far the most popular algorithm, it's easy to conflate these two things. But there are other approaches that don't treat noise so well.