|
|
|
|
|
by irthomasthomas
14 days ago
|
|
It means for 4.7 they trained a new base model with different architecture, different pre-training data (later knowledge cutoff), and a new tokenizer.
Vs finetuning an existing model, which was the case for 4.6, and probably for 4.8. |
|
btw where do they tell you how they trained the model.