| HN Mirror

That's actually closer to how deep learning started. Initially, deep learning mostly consisted of unsupervised (task independent) features with a linear classifier on top. We had to fit an unsupervised model (e.g. autoencoder) layer by layer before using the feature layers in a supervised task.

This was because we didn't understand how to train a deep model end-to-end until later. When we learned how to make that end-to-end training work it tended to perform better because the learned features were task specific.

You can still learn general features in a bunch of ways, in addition to the older method using autoencoders. For one example, multiple supervised heads with auxiliary losses can learn more generalize features.