| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jawarner 1521 days ago

The tweet is in response to a preliminary paper [1] [2] studying text found in images generated by, e.g., "Two whales talking about food, with subtitles." DALL-E doesn't generate meaningful text strings in the images, but if you feed the gibberish text it produces -- "Wa ch zod ahaakes rea." back into the system as a prompt, you would get semantically meaningful images, e.g., pictures of fish and shrimp.

[1] https://giannisdaras.github.io/publications/Discovering_the_...

[2] https://twitter.com/giannis_daras/status/1531693093040230402

2 comments

dekhn 1521 days ago

I think the tweeter is being a bit too pedantic. Personally I spent some time thinking about embeddings, manifolds, the structure of language, scientific naming, and what the decoding of the points near the center of clusters in embedding spaces look like (archetypes), after seeing this paper. I think making networks and asking them to explain themselves using their own capabilities is a wonderful idea that will turn out to be a fruitful area of research in its own right.

lotaezenwa 1521 days ago

I concur that the tweeter is being pedantic.

This is largely some embedding of semantics that we currently do not fully have a mapping for, precisely because it was generated stochastically.

Saying it was "not true" seems like clickbait.

ASalazarMX 1521 days ago

If DALL-E had a choice to output "Command not understood", maybe we wouldn't be discussing this.

Like those AIs that guess what you draw, and recognize random doodling as "clouds", DALL-E is probably using the least unlikely route. That a gibberish word is drawn as a bird is maybe because it was "bird (2%), goat (1%), radish (1%)".

1. https://quickdraw.withgoogle.com

mjburgess 1520 days ago

That's extremely optimisic. When faced with gibberish, the "confidences" are routinely 90%+ as with "meaningful" input.

It's almost as-if its an illusion designed to fool, we, the users.. by only providing inputs meaningful to us, we come to the foolish idea that it understands these inputs.

superjan 1520 days ago

This is a good point. The fact that DALL-E will try to render something, no matter how meaningless the input, is a trait it has in common with many neural networks. If you want to use them for actual work, they should be able to fail rather than freestyle.

koboll 1521 days ago

Especially since his results confirm most of what the original thread claimed. A couple of the inputs did not reliably replicate, but "for the most part, they're not true" seems straightforwardly false. He even seems to deliberately ignore this sometimes, such as when he says "I don't see any bugs" when there is very obviously a bug in the beak of all but two or three of the birds.

mannykannot 1521 days ago

When I zoomed in, I felt only four in ten birds clearly had anything in their beaks, and in each case it looked like vegetable matter. In the original set, only one clearly has an insect in its beak.

Are there higher-resolution images to be had?

ASalazarMX 1521 days ago

Lower in the same thread he accepts that his main tweet was clickbaity, and that actually there's consistency in some of the results.

Jweb_Guru 1521 days ago

Not really, he afterwards says that he was more trying to inject some humility. He really doesn't think this is measuring anything of interest. For the birds result in particular, see https://twitter.com/BarneyFlames/status/1531736708903051265.

dekhn 1520 days ago

If I read what that tweet says properly, the system ended up outputting things that were almost scientific nomenclature for the general class of items it was being asked to draw. There are probably many examples of "bird is an instance of class X" in the text but they are not consistent, and the resulting token vector is a point near the center of "birdspace".

ASalazarMX 1520 days ago

When questioned about the change of tone, he answers "Well... a little bit of twitter hype makes a thread go a long way".

https://twitter.com/emnode/status/1531852124501553153

austinjp 1521 days ago

> asking [neural networks] to explain themselves using their own capabilities

Exactly. This could be profound. I'm looking forward to further work here. Sure, the examples here are daft, but developing this approach could be like understanding a talking lion [0] only this time it's a lion of our making.

[0] https://tzal.org/understanding-the-lion-the-in-joke-of-psych...

lanstin 1520 days ago

I think it’s more likely we can train two neural networks, one to make the decision and one to take the same inputs (or the same inputs plus the output from the first one) and generate plausible language to explain the first. This seems to correspond to what we dimwits consciousness and frankly I would doubt one system can accurately explain its own mechanism. People surely can’t.

LudwigNagasena 1520 days ago

It’s a fruitful area of research for sure, but there is a huge gap between “it invented pig Latin” and “it invented Esperanto/Lojban”. Referring to the first as inventing a language is very misleading.

numpad0 1521 days ago

> "Wa ch zod ahaakes rea."

“Watch those sea creatures.”?

nomel 1521 days ago

Are you claiming it has learned to read using hooked on phonics? No wonder it can't spell!