> If Google is to be believed, this outperforms Dall-E - and I’ve heard from people that use it that in general, it does perform better than Dall-E.
Google actually surpassed its own model[0], first with Parti[1], which unlike DALL-E could even correctly insert text in the image, then with Imagen Video[2], which does as the name implies.