| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by EvgeniyZh 1522 days ago
	You could expect that gibberish is distributed uniformly in latent space, disconnected from it's langual counterpart -- after all those are textual inputs that model have never seen, and it can't even map words it have seen many times to their writing in image properly: "seafood" word and "seafood" image are in the same place in latent space, but "seafood" word in image isn't. Yet some gibberish word in image is, and also the same gibberish word is. It's very counterintuitive for me.

1 comments

TOMDM 1522 days ago

A uniform distribution makes sense for gibberish, not something I'd considered.

A counterpoint I'd raise is I wonder how aggressive Dall-E 2 is in making assumptions about words it hasn't seen before.

Hard to do given that it's read essentially the entire internet, however someone could make up some latin-esque words that people would be able to guess the meaning of.

If the model is as good as people at assuming the meaning of such made up words, it could stand to reason that if it were aggressive enough in this it might be doing the same thing with gibberish and thus ending up with it's own interpretation of the word, which would land it back in a more targeted concept space.

I'd love to see someone craft some words that most people could guess the meaning of, and see how Dall-E 2 fairs.

link

astrange 1522 days ago

Prior art with GPT2: https://www.thisworddoesnotexist.com/

link

ksaj 1521 days ago

This might be considerably different, and calling it "prior art" fails to consider what is actually going on here. The appearance may be similar, but lots of things can look similar while being completely distinct. And this is indeed such a case.

One of the words I got was "charlite" for the pale green colour of charcoal used as a dye. Charlite might not be a real word, but it is made up the same way a real word would be.

The method is important, because "charlite" probably came about by specifically asking GPT2 for a definition to the non-word "charlite."

In fact, this shows up in the source code examples:

# definition for a word you make up print(word_generator.generate_definition("glooberyblipboop"))

This is literally the opposite of what OP is presented, since we know where the "defined" word comes from with the GPT2 examples, which means that was a demo of GPT2 trying to work out a human provided word. It is literally a function of the program: generate_definition(). It was specifically written to do that.

But we don't know where the words come from, even though they are internally consistent, with the DALL-E 2 examples. As far as we can tell, it's an internal phenomenon not based on intentional human input.

Having said that, GPT2 probably has the same phenomenon. But the link you provided is not demonstrating that.

link

TOMDM 1522 days ago

ok, so proposed study design, provide a sample of these along with obscure english words to a number of individuals, and get them to try pick out the real words.

From there take the selection of the fake words people ranked the most real.

Select a number of those words and get Dall-E 2 to try and make images of them, then see how many of those images contain results that represent the imaginary word.

If anyone who has access to Dall-E 2 wants to try this, I would _love_ to see the results.

link

totetsu 1522 days ago

Apparently you can suggest prompts to their Instagram account.

link