| > Is it always good enough to take the outputs of the next-to-last layer as features? It usually doesn't matter all that much whether you take the next-to-last or the third from last, it all performs pretty similarly. If you're doing transfer to a task that's very dissimilar from the pretraining task, I think it can sometimes be helpful to take the first dense layer after the convolutional layers instead, but I can't seem to find the paper where I remember reading that, so take it with a grain of salt. > When doing quick iterations, I assume the images in the data set have been run through the big net as a preparation step? Yep. (And, crucially, you don't have to run them through again every iteration.) > And the inputs to the net you're training is the features? Does the new net always only need 1 layer? Yeah, you take the activations of the late layer of the pretrained net and use them as the input features to the new model you're training. The new model you're training can be as complicated as you like, but usually a simple linear model performs great. > What are some examples of where this worked well (except for the flowers mentioned in the article)? The first paper in the post (https://arxiv.org/abs/1403.6382) covers about a dozen different tasks. |