Imho a CNN is like a generalization of what SIFT does, so a CNN can be trained to be equivalent to SIFT, but it can also be trained with more specific features for your use case.
Sure, in terms of expressivity, you can obtain much better results with a CNN. But very often, it is done at the cost of computational efficiency: SIFT descriptors are "easy" to compute.