|
|
|
|
|
by rd42
628 days ago
|
|
I think the only relevant part to note here is that this model showed improved text-only performance after multimodal training. Wonder if this translates to Llama models also ? Is it possible to extend Llama 3.1 405b with multi-modal training to create another SOTA large model ? |
|
Allowing the language model weights to be updated during training could potentially result in better performance on both tasks, though, if Nvidia's result replicates. I could believe that it might: after all, more diverse data is more diverse data, and the model will be forced during training to generalize more.