Hacker News new | ask | show | jobs
by tragictrash 1532 days ago
I've been sitting here with my mouth wide open for 5 minutes unable to move past what you just showed me. I can't fathom that this exists.
1 comments

DALL-E 2 isn't the first superhuman AI, but it is the first capable of teaching the whole world of just what that means for all of us.
I've been casually following this space for a while (as a full stack web/mobile engineer, nothing to do with ai) and this feels substantially different than what I've seen before.

Would you have names or links for some other projects you're aware of? Would love to check them out.

GPT-3 is surely as jaw dropping as this?
No, GPT-3 still produces gibberish at times. The majority of the good examples still ramble like a schizophrenic person. Much of the output is uncanny, interesting, and impressive in its own right but I wouldn't describe it as human level.

DALL-E 2 is different from what I've seen. The things it produces seem to actually make sense the majority of the time. The outputs are strikingly similar to what a competent human might output as opposed to one with a severe mental illness.

I'm sure part of this is an inherent advantage that DALL-E enjoys regarding context. Art is supposed to be artistic whereas text is expected to maintain long distance logical consistency of abstract concepts across a stream of output and also to communicate something concrete. So in a sense the bar for art is probably lower in many ways.

The difference is I've had the chance to play with GPT-3 extensively and I've only got 2nd hand access to Dall-E 2.

GPT-3 amazes me and occasionally disappoints me. But it's still something I never thought I'd see in my lifetime. I suppose I'm still putting GPT-2 and Dall-E 2 in the same ball park because they are both so far beyond what I thought would be possible from what are essentially brute force methods.

You cannot absorb words as fast as pictures. GTP-3 is more impressive as it seems to have auch broader depth of understanding context. The disadvantage of GTP-3 is that it is sometimes very wrong like with simple math problems
Interestingly DALL-E is really bad at spelling. It knows what letters look like, but struggles with words.
Yes, and if you look at the "blue cube on a red cube beside a yellow sphere" example, it's clear that there are other areas where it simply lacks the semantic basis to get a request that needs to be correct in a non-image sense right. It knows letters, and that letters come in sequences related to things it might paint, but it has no very good dictionary mapping those sequences to things; it knows how to draw a cube, and a sphere, but the semantics of "on" and "beside" are largely absent.

I don't think that is terribly surprising, nor a very cogent detraction from the model.

Very interesting observation!!!
Did GPT-3 write this comment ?