|
|
|
|
|
by lonesword
1150 days ago
|
|
No finetuning for just a few facts could indeed end up being very costly. If you have 5 new examples that you want to fine-tune your model on, you probably wont fine-tune your existing model for 3000 training steps on just those 5 new examples. You'll either mix in other data to prevent catastrophic forgetting, or you'll probably training from scratch after fixing your dataset to reflect the 5 new examples you have. |
|
Total time is mere seconds.
If you save the adam optimizer parameters from previous runs, you'll do even better at preventing forgetting.