Hacker News new | ask | show | jobs
by londons_explore 1152 days ago
You'll probably finetune for one step for each of the 5 examples. Choose the learning rate carefully to get the results you want without much forgetting.

Total time is mere seconds.

If you save the adam optimizer parameters from previous runs, you'll do even better at preventing forgetting.