|
|
|
|
|
by Oras
1066 days ago
|
|
According to the website, the model can then fine-tuned for certain tasks such as image classification. 1. How does the multi-model help here in improving the accuracy of image classification when training is combined from text, images, and audio? 2. How about the speed? I would imagine a model with text, audio and image data would be larger compared to text-only models? |
|