| I don't want to be dismissive of Dall-E itself or its authors. Just the implications that this changes everything or how it is much more than it really is. https://twitter.com/nickcammarata/status/1512123067803344899... Prompt: "expressive painting of a man shining rays of justice and transparency on a blue bird twitter logo" You have to break the concepts up apart (which is one of the things Dall-E improved on). As such: "expressive blue bird" In google image search, type clipart, and I even get pill tags to further narrow it down to illustrations for animal paintings and so forth. Google's classifier knows the concept of a "blue bird" and expressionism too. https://www.google.com/search?q=expressive+blue+bird&tbm=isc... The same for "ray of light". In fact the top results there I get pngs of sun beams on a transparent background. Which is perfect. Neither the birds nor the rays of light in the pictures it produced are truly its own creations but lifted from bits of pictures in its training set. I bet you could find the exact bird from the second row online in many places for example. It just won't be blue or stylized. Composite those things together manually and add a style transfer you'll get similar results to DALL-E as that is what it is doing more or less. |
If you try actually doing this it will be trivial to see that this assertion is incorrect.
1. The way in which the elements of the images are integrated together is deeper than the level of style. For instance, see the image in the top row, second column: it has integrated the blue bird wings onto the man, not only simply grafting them on, but giving the appearance of their being draped on like a cloak, partly behind and partly in front of him (+ it's consistent with the man's posture and the rays of light to evoke a certain coherent cultural idea/image). You might be able to integrate multiple images (of man, bird, rays etc.) together and style transfer to arrive at a poor approximation of this—but even then, the decision to place the elements together in such a way would require creativity on your part.
2. The one example set of of trial images (generated from the phrase "expressive painting of a man shining rays of justice and transparency on a blue bird twitter logo") is one of the easiest among the full group to pick its various elements apart; if you try this thought experiment with the others in the thread, you'll see this idea is by far insufficient.