|
|
|
|
|
by EvgeniyZh
1475 days ago
|
|
You could expect that gibberish is distributed uniformly in latent space, disconnected from it's langual counterpart -- after all those are textual inputs that model have never seen, and it can't even map words it have seen many times to their writing in image properly: "seafood" word and "seafood" image are in the same place in latent space, but "seafood" word in image isn't. Yet some gibberish word in image is, and also the same gibberish word is. It's very counterintuitive for me. |
|
A counterpoint I'd raise is I wonder how aggressive Dall-E 2 is in making assumptions about words it hasn't seen before.
Hard to do given that it's read essentially the entire internet, however someone could make up some latin-esque words that people would be able to guess the meaning of.
If the model is as good as people at assuming the meaning of such made up words, it could stand to reason that if it were aggressive enough in this it might be doing the same thing with gibberish and thus ending up with it's own interpretation of the word, which would land it back in a more targeted concept space.
I'd love to see someone craft some words that most people could guess the meaning of, and see how Dall-E 2 fairs.