Hacker News new | ask | show | jobs
by karxxm 987 days ago
Some architectures are relatively well understood. Eg in CNNs, the first layers detect low level features like edges, gradients, etc. The next layer then combines these features to more complex structures like corners or circles. Next layer will combine these features to even higher level features and so on. [1]

Typically, you can take a pre-trained model and retrain it on your new dataset by only changing the weights of the last layer(s).

Some loss functions even measures the difference between the high-level features of two images, typically extracted from a pre-trained CNN (Perceptual Loss).

[1]Matt Zeiler did an amazing work on these findings 10 years ago (https://arxiv.org/abs/1311.2901).