Hacker News new | ask | show | jobs
by redredrobot 1473 days ago
So his argument is that the text clearly maps to concepts in the latent space, but when composing them the results are unexpected, so it isn't language? Why isn't this better described as 'the rules of composition are unknown'?
1 comments

That framing is worse because it hides an assumed conclusion, i.e. that there are rules of composition.
But don't we already know that composition exists in DALL-E? Don't the points shown in the tweet indicate that some form of composition exists? The 3D renders are clearly render-like, the painting and cartoons are clearly in the appropriate style.
"That there exist rules of composition of the hypothesized secret DALL-E language" is a much stronger claim than that it "understands" composition of text in the real languages it was trained on.

Though I'll also point out that even evidence for that weaker claim is tenuous. It definitely knows how to move an image closer to "3D render" in concept-space, but it doesn't seem to understand the linguistic composition of your request. For example, you'd have an extremely hard time getting it to generate an image of a person using 3D rendering software, or a "person in any style that isn't 3D render"; it would probably just make 3D renders of persons.

I haven't played around with it myself, I'm going off the experiences of others. For example:

https://astralcodexten.substack.com/p/a-guide-to-asking-robo...

I don't think I've ever experienced such a disconnect between someones critique and my own. These "poor" examples are still completely amazing to me!