|
|
|
|
|
by hakuseki
1204 days ago
|
|
This is just a guess, but I don't think there's such a deep lesson here; language models and image models have simply been developed by mostly-different groups of researchers who chose different tradeoffs. In an alternate history it may very well have gone the other way around. |
|
Simplifying a bit, mapping (which is essentially the main goal of image generators and especially transformer generators) is just less complex than prediction.
It's like how bilingual llms can be much better translators than traditional map this sentence to this sentence translators. https://github.com/ogkalu2/Human-parity-on-machine-translati...