|
|
|
|
|
by rhdunn
176 days ago
|
|
If you are finetuning the model you need to replicate the training conditions so you don't remove those capabilities. If you just finetune a multi-modal model on text it will lose some of the vision capabilities as the text part of the model will drift from the vision, audio, etc. models. A similar thing happens with finetuning reasoning models. Even if you did finetune the models with text and images then you could run into issues with using different descriptions for images to what it was trained with. Though you could probably work around that by getting the model to describe the images, but you'll still need to audit the results to correct any issues or add what you are training for. You can also run into overfitting if your data does not include enough variations along a given training set that the original model had access to. Using different training parameters could also affect the models capabilities. Just knowing things like the input context isn't enough. |
|
On the other hand, RL on deployed systems looks promising to essentially JIT optimize models. Experiments with model routers and agentic rag have shown good results.