| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tragictrash 1532 days ago
	I've been sitting here with my mouth wide open for 5 minutes unable to move past what you just showed me. I can't fathom that this exists.

1 comments

gfodor 1532 days ago

DALL-E 2 isn't the first superhuman AI, but it is the first capable of teaching the whole world of just what that means for all of us.

link

tragictrash 1532 days ago

I've been casually following this space for a while (as a full stack web/mobile engineer, nothing to do with ai) and this feels substantially different than what I've seen before.

Would you have names or links for some other projects you're aware of? Would love to check them out.

link

andybak 1532 days ago

GPT-3 is surely as jaw dropping as this?

link

d110af5ccf 1532 days ago

No, GPT-3 still produces gibberish at times. The majority of the good examples still ramble like a schizophrenic person. Much of the output is uncanny, interesting, and impressive in its own right but I wouldn't describe it as human level.

DALL-E 2 is different from what I've seen. The things it produces seem to actually make sense the majority of the time. The outputs are strikingly similar to what a competent human might output as opposed to one with a severe mental illness.

I'm sure part of this is an inherent advantage that DALL-E enjoys regarding context. Art is supposed to be artistic whereas text is expected to maintain long distance logical consistency of abstract concepts across a stream of output and also to communicate something concrete. So in a sense the bar for art is probably lower in many ways.

link

andybak 1532 days ago

The difference is I've had the chance to play with GPT-3 extensively and I've only got 2nd hand access to Dall-E 2.

GPT-3 amazes me and occasionally disappoints me. But it's still something I never thought I'd see in my lifetime. I suppose I'm still putting GPT-2 and Dall-E 2 in the same ball park because they are both so far beyond what I thought would be possible from what are essentially brute force methods.

link

gitfan86 1532 days ago

You cannot absorb words as fast as pictures. GTP-3 is more impressive as it seems to have auch broader depth of understanding context. The disadvantage of GTP-3 is that it is sometimes very wrong like with simple math problems

link

tiluha 1532 days ago

Interestingly DALL-E is really bad at spelling. It knows what letters look like, but struggles with words.

link

walnutclosefarm 1532 days ago

Yes, and if you look at the "blue cube on a red cube beside a yellow sphere" example, it's clear that there are other areas where it simply lacks the semantic basis to get a request that needs to be correct in a non-image sense right. It knows letters, and that letters come in sequences related to things it might paint, but it has no very good dictionary mapping those sequences to things; it knows how to draw a cube, and a sphere, but the semantics of "on" and "beside" are largely absent.

I don't think that is terribly surprising, nor a very cogent detraction from the model.

link

tragictrash 1532 days ago

Very interesting observation!!!

link

danw1979 1532 days ago

Did GPT-3 write this comment ?

link