| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Oras 1066 days ago

According to the website, the model can then fine-tuned for certain tasks such as image classification.

1. How does the multi-model help here in improving the accuracy of image classification when training is combined from text, images, and audio?

2. How about the speed? I would imagine a model with text, audio and image data would be larger compared to text-only models?