Hacker News new | ask | show | jobs
by lonesword 1150 days ago
No finetuning for just a few facts could indeed end up being very costly. If you have 5 new examples that you want to fine-tune your model on, you probably wont fine-tune your existing model for 3000 training steps on just those 5 new examples. You'll either mix in other data to prevent catastrophic forgetting, or you'll probably training from scratch after fixing your dataset to reflect the 5 new examples you have.
1 comments

You'll probably finetune for one step for each of the 5 examples. Choose the learning rate carefully to get the results you want without much forgetting.

Total time is mere seconds.

If you save the adam optimizer parameters from previous runs, you'll do even better at preventing forgetting.