Hacker News new | ask | show | jobs
by dekhn 1474 days ago
I think the tweeter is being a bit too pedantic. Personally I spent some time thinking about embeddings, manifolds, the structure of language, scientific naming, and what the decoding of the points near the center of clusters in embedding spaces look like (archetypes), after seeing this paper. I think making networks and asking them to explain themselves using their own capabilities is a wonderful idea that will turn out to be a fruitful area of research in its own right.
3 comments

I concur that the tweeter is being pedantic.

This is largely some embedding of semantics that we currently do not fully have a mapping for, precisely because it was generated stochastically.

Saying it was "not true" seems like clickbait.

If DALL-E had a choice to output "Command not understood", maybe we wouldn't be discussing this.

Like those AIs that guess what you draw, and recognize random doodling as "clouds", DALL-E is probably using the least unlikely route. That a gibberish word is drawn as a bird is maybe because it was "bird (2%), goat (1%), radish (1%)".

1. https://quickdraw.withgoogle.com

That's extremely optimisic. When faced with gibberish, the "confidences" are routinely 90%+ as with "meaningful" input.

It's almost as-if its an illusion designed to fool, we, the users.. by only providing inputs meaningful to us, we come to the foolish idea that it understands these inputs.

This is a good point. The fact that DALL-E will try to render something, no matter how meaningless the input, is a trait it has in common with many neural networks. If you want to use them for actual work, they should be able to fail rather than freestyle.
Especially since his results confirm most of what the original thread claimed. A couple of the inputs did not reliably replicate, but "for the most part, they're not true" seems straightforwardly false. He even seems to deliberately ignore this sometimes, such as when he says "I don't see any bugs" when there is very obviously a bug in the beak of all but two or three of the birds.
When I zoomed in, I felt only four in ten birds clearly had anything in their beaks, and in each case it looked like vegetable matter. In the original set, only one clearly has an insect in its beak.

Are there higher-resolution images to be had?

Lower in the same thread he accepts that his main tweet was clickbaity, and that actually there's consistency in some of the results.
Not really, he afterwards says that he was more trying to inject some humility. He really doesn't think this is measuring anything of interest. For the birds result in particular, see https://twitter.com/BarneyFlames/status/1531736708903051265.
If I read what that tweet says properly, the system ended up outputting things that were almost scientific nomenclature for the general class of items it was being asked to draw. There are probably many examples of "bird is an instance of class X" in the text but they are not consistent, and the resulting token vector is a point near the center of "birdspace".
Yes. Indeed, it seems to interpret a lot of nonsense tokens it doesn't recognize as though it's probably the Latin / scientific term for some sort of species it doesn't remember very well (keeping in mind that all these systems are attempting to compress a large corpus into a relatively small space). I think https://twitter.com/realmeatyhuman/status/153173904648934195... is best illustrative of this phenomenon.

So, it's certainly an "interesting" result in the sense that it shows how these kinds of systems work, but it's definitely not a language.

When questioned about the change of tone, he answers "Well... a little bit of twitter hype makes a thread go a long way".

https://twitter.com/emnode/status/1531852124501553153

> asking [neural networks] to explain themselves using their own capabilities

Exactly. This could be profound. I'm looking forward to further work here. Sure, the examples here are daft, but developing this approach could be like understanding a talking lion [0] only this time it's a lion of our making.

[0] https://tzal.org/understanding-the-lion-the-in-joke-of-psych...

I think it’s more likely we can train two neural networks, one to make the decision and one to take the same inputs (or the same inputs plus the output from the first one) and generate plausible language to explain the first. This seems to correspond to what we dimwits consciousness and frankly I would doubt one system can accurately explain its own mechanism. People surely can’t.
It’s a fruitful area of research for sure, but there is a huge gap between “it invented pig Latin” and “it invented Esperanto/Lojban”. Referring to the first as inventing a language is very misleading.