|
|
|
|
|
by jofi1
1024 days ago
|
|
> It is something I have been wondering about: why did Meta not keep the training process going on while the loss curves seemed to go down? Could they conceivably release a Llama 2.1 being checkpoints taken a month after 2.0 was 'cut'? Maybe the expected gain is too small compared to what can be gained with fine/instruct tuning afterward anyway? Because choosing the LR decay requires knowing the # of steps in advance. LR is too small after the 2T tokens, and changing it afterwards doesn't tend to help. https://twitter.com/sherjilozair/status/1687837844729966592 |
|