|
|
|
|
|
by kristjank
1066 days ago
|
|
Yo dawg, we heard you like transformers so we put transformers on your transformers so you can train while you train.
The spider web graph shows metatransformers performing worse to their counterparts in almost all fields. Is there a reason I should not believe that an expert model will always outperform a general purpose one, even if it's a metatransformer? |
|
I think it aims to leverage the cross-modal relationships and unified learning, which might not be possible with expert models designed for only a single modality.
Even if it performs slightly worse on some tasks, the ability to handle multiple modalities within a single framework is an pretty sweet advantage in scenarios where data from various sources need to be processed simultaneously, and patterns across modalities need to be captured somehow.
A general-purpose model could also be a more cost-effective solution in some cases, ensemble experts are difficult to scale and parallelize.