|
|
|
|
|
by aesthesia
70 days ago
|
|
The new tokenizer is interesting, but it definitely is possible to adapt a base model to a new tokenizer without too much additional training, especially if you're distilling from a model that uses the new tokenizer. (see, e.g., https://openreview.net/pdf?id=DxKP2E0xK2). |
|
They also changed the image encoder, so I'm thinking "new base model". Whatever base that was powering 4.5/4.6 didn't last long then.