|
|
|
|
|
by singularity2001
1189 days ago
|
|
become more creative with a fraction of the data consumed by GPT-4
not if you understand the input stream of vision as an equivalent input stream of semantic tokens as in multimodal models. under that definition people looking around for 10 years receive much more training data than large language models and thus perform a bit better at zero shot inference. |
|