Clustering algorithms operate on features, which typically have to be designed by hand. The appeal of deep learning is that it discovers good features automatically.
Feature sensitivity is typically hand-crafted only because it's the practical thing to do. Neural nets can easily learn visual features. See the LISSOM neural nets for a good example of self-organized learning of features.