Hacker News new | ask | show | jobs
by ocharles 972 days ago
As I understand it, the point is that these models while they are _trained_ on identifying cats or cars, because they have soon so much variation during training have internalised very different concepts to help come up with "its a cat". The idea then is to take all of these pre-trained weights that let you build this classifier, but then add your own custom head on the front of this network. This saves you doing a huge amount of training for what is essentially feature extraction - that part is already done. All you need to do is just add a bit more training that works out how to use these learnt features. I could be way off the mark, but that's how I understand it.
1 comments

Yes, your understanding is correct. However, instead of adding a head on top of the network, most fine-tuning is currently done with LoRA (https://github.com/microsoft/LoRA). This introduces low-rank matrices between different layers of your models, those are then trained using your training data while the rest of the models' weights are frozen.