|
|
|
|
|
by password54321
171 days ago
|
|
>Also if you want to have more semantics, you add image, video and audio to your model. It gets smarter because of it. I think you are confusing generation with analysis. As far I am aware your model does not need to be good at generating images to be able to decode an image. |
|
Now there are all sorts of tricks to get the output of this to be good, and maybe they shouldn't be spending time and resources on this. But the core capability is shared.