| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by PoignardAzur 1476 days ago

Wait, how does that make any sense?

I thought DALL-E's language model was tokenized, so it doesn't understand that eg "car" is made up of the letters 'c', 'a' and 'r'.

So how could the generated pictures contain letters that form words that are tokenized into DALL-E's internal "language"? Shouldn't we expect that feeding those words to the model would give the same result as feeding it random invented words?

Actually, now that I think about it, how does DALL-E react when given words made of completely random letters?