|
For some reason this comment from someone else was deleted: "My first reaction to this was, "It probably has to do with tokenization. If there's a 'language' buried in here, its native alphabet is GPT-3 tokens, and the text we see is a concatenation of how it thinks those tokens map to Unicode text."
Most randomly concatenated pairs of tokens simply do not occur in any training text, because their translation to Unicode doesn't correspond to any real word. There are also combinations that do correspond to real words ("pres" + "ident" + "ial") but still never occur in training because some other tokenization is preferred to represent the same string ("president" + "ial"). Maybe DALL-E 2 is assigning some sort of isolated (as in, no bound morphemes) meaning to tokens — e.g., combinations of letters that are statistically likely to mean "bird" in some language when more letters are revealed. When a group of such tokens are combined, you get a word that's more "birdlike" than the word "bird" could ever be, because it's composed exclusively of tokens that mean "bird": tokens that, unlike "bird" itself, never describe non-birds (e.g., a Pontiac Firebird). The exact tokens it uses to achieve this aren't directly accessible to us, because all we get is poorly rendered roman text." I wonder if this is why the term for "bird" seemed to be in faux binomial nomenclature, the scientific names for animals. I assume that in the training set there were images of birds/insects with their scientific name. An image labeled with the scientific name would always be an image of an animal, unlike images with the word bird in them which could be of a birdhouse, Pontiac Firebird, or someone playing golf. That would mean that in the latent space when DALLE wants to represent a bird as accurately as possible, it will use the scientific name, or a gibberish/tokenized version of the scientific name-- like someone trying to make up a name that sounds regal might say "Sir Reginard Swellington III". Even though it's not a real name it encodes into the latent space of royal-sounding names. I wonder if this could be extended to other things with very specific naming conventions. For example aircraft names: "Gruoeing B-26 Froovet" might encode into military aircraft latent space. |