Hacker News new | ask | show | jobs
by sanman811 1532 days ago
This tweet provides important context: https://twitter.com/nickcammarata/status/1512119623315075081...

They weren’t just copying/pasting prompts there was human creativity involved as well

5 comments

It is important context, but just to push back against people over-correcting on this, my guess is that the ones he rejected also looked approximately this good.

I think the primary reason people are wowed by this thread isn't attributable mainly to the subtle effect of the cherry-picking he did, but in fact to the overall quality of any image generated by DALL-E 2.

Yeah that’s right. There were very few strictly-bad ones across the entire thread of generations

The rejections were most commonly

1. Kind of just slightly boring or literally drawing the thing rather than being cool and artistic

2. Cool but similar to the artistic style of bios near it in the thread, whereas I wanted to keep it diverse (surreal followed by literal, oil followed by sharp lines etc) so it's more fun to scroll through

Whereas a few years ago generative models (GANs etc) would often render like static noise sometimes or completely wrong things. I've only seen that problem once with DALL-E across hundreds or thousands of images now (it generated a fully white image)

> 2. Cool but similar to the artistic style of bios near it in the thread, whereas I wanted to keep it diverse (surreal followed by literal, oil followed by sharp lines etc) so it's more fun to scroll through

Has anyone compiled a list of the styles and artists Dall-E "knows"? How niche does it get? Decorative Initial Caps? Florid Victorian Ornaments? Googie Architecture? SFF artists like Michael Whelan, Vincent Di Fate, Jeffrey Catherine Jones, or Jim Burns? Banksy? Sculptors like Bathsheba Grossman or Markus Pierson? Early animation artists like Ub Iwerks or E. C. Segar?

I was experimenting with one of the VQGAN+Clip notebooks a while ago, and it did pretty well with some styles, but not so much with "Heroic Realism" or "Soviet Propaganda Poster" or "Sheppard Fairey", and even worse when I was trying to get it to draw in that style an object that could be construed as implying a style itself like "retro robot" or "50s raygun" (eg. "A retro robot drawn in a heroic realism style" or "A cubist painting of a steampunk pistol"). Is that kind of dissonance a problem for Dall-E?

Can you ask DALL-E to draw itself?
Yes, and it sees itself as a really cute little demon: https://mobile.twitter.com/gdb/status/1512521912064229377
Somehow this makes me feel a bit more at ease about this whole thing.
Maybe that's how they want you to feel.
Is that like asking GitHub’s code autocomplete to write a code autocompleter?
Asking Copilot to write Copilot. Hmmmmmm…
That's very cool but once you have stable image output how do you define good image output when it comes to art?

The stuff on deviantart is pretty good too and neatly tagged and classified by art style.

I’d often send like six images to the person who’s bio I was making and ask them to choose two :)
For better results send three, two being static noise :)
https://twitter.com/nickcammarata/status/1512123067803344899...

You're absolutely right, here he displays the full set for a given prompt. They all look fantastic!

I've been sitting here with my mouth wide open for 5 minutes unable to move past what you just showed me. I can't fathom that this exists.
DALL-E 2 isn't the first superhuman AI, but it is the first capable of teaching the whole world of just what that means for all of us.
I've been casually following this space for a while (as a full stack web/mobile engineer, nothing to do with ai) and this feels substantially different than what I've seen before.

Would you have names or links for some other projects you're aware of? Would love to check them out.

GPT-3 is surely as jaw dropping as this?
Having worked with Nick extensively, take what he says with a grain of salt. He’s well known even by close friends to be a reality distorter, to put it softly.
Sir, this is a public discussion over a well-enough documented breakthrough with good-faith non-corporate actors on both sides of the original friend-oriented equation. There’s no practical nor epistemic need to hijack it as if we were all hanging out in the laundromat of your worldview.
This is a public forum discussing a public tweet made by an employee of a for-profit private company who sells you this technology. And said employee is a traveling salesman and consummate hype machine, acknowledged by his own best friends - and even self many times.

Practical and epistemically relevant knowledge to anyone deciding how interesting these results, presented originally without mentioning they were cherry-picked, are. I’m doing a favor to provide it, as doing so isn’t exactly something that makes me look great, but is very much worth knowing for anyone following him.

Side note - there’s this SF club of effete intellectualistas who fashion themselves as modern day florentines during a de novo renaissance. They do a lot of back-patting. They have the exactly mentality of your reply - be kind, love is all you need, etc.

It’s sort of the exact opposite of the east coast mentality that willingly sacrifices looking good and “getting along” in favor of finding the truth despite some discomfort. Discomfort to this group is very taboo.

Of course, this don’t-rock-the-boat mentality is very much intentional as it gives said club the ability to instantly shun anyone who deigns to critique it, allowing them to continue building their following.

Your first critique was ad-hominem.

Your second critique: assumes the original presentation had to be accompanied by methodology and proof to be of value; derives an implicit attempt-to-distort from your perspective of the scene at hand; devolves into paternalism to end in unsubstantiated moralism.

I might even agree with the spirit underlying your words—given, say, the meaning-loss of the company’s name—this just isn’t the way to convey it.

I’m adding to truth finding, in relevant context. My names in my profile. It’s a discussion forum - the opinions are the point.
If only James Randi was around. What a fantastic example of cold reading.

Gather round, gather round, give me a text, any text at all and I will produce you an image of some kind. And you will call it "good" if it looks like anything at all.

Because all art is subjective and your mind will work overtime to connect it back to the text you provided.

Even if the text just serves as random entropy, it's alright for people to feel a subjective connection between the artwork and the text.
Does OpenAI have a GUI that you're using or is that a CLI?
Now imagine if you can if the situation was reversed, where the AI was adding cyberpunk/oil/etc. to the front of the prompt and it was the human that was interpreting it and painting the many variations.

How many people would then be defending the AI, that actually it wasn't just the human, the AI was playing a critical role in the creative process, ne'er to be replaced? I venture zero people would say that.

Ah, I wish this fact had been highlighted better. Not a criticism of the tweet author; it's just that twitter threads really aren't designed to convey context.
I hate them with every part of my soul! It's so sad to see the internet has moved from people making blog posts to share interesting things to just spraying it on Twitter in batches for maximum interactions.
Of course not. I'm no longer surprised just how eager people are to believe an "AI" will read their minds or has magical qualities and a mind of its own. Even on HN.

Jiggle the imagination just a little bit, dangle some progress, and we're off to the races.

This is "I'm feeling lucky" on google image search + style transfer + trial and error.

If you think I am being dismissive try a few of these twitter bios as searches and see for yourselves.

I guess it fits with the times we live in. Reward shallow plagarism. Outsource your mind.

It isn't theft if you can automate it.

Autotune for the deaf, Dall-E for the blind.

I tried what you suggested for a bunch of the twitter bios and found nothing except links back to this thread. I also reverse image searched a bunch of them to see if DALL-E was just kind of pasting together large chunks of images, but never found anything close. I do think you're being dismissive but please post any examples of what you mean. I'm a skeptic and have been waiting to find out that this is just a glorified parlor trick, but so far it seems like DALL-E is doing everything the authors claim, which is remarkable.
It's fascinating how in our hubris we were thinking that art would be the last thing for AI to tackle, but it appears to be the first (Sam Altman made a similar statement on the launch of DALL-E). Which makes art more meaningful to me, for some reason. There's something in the billion parameters and exabytes of data that this neural net had to process and it was so ... easy. Natural. Because it is us. It is our expression. Our creativity. Our outpouring of data, and all it is doing is reflecting us. It's beautiful.
I'm an amateur painter and AI hasn't even kissed high art yet IMO, although it's nominally good at amateur illustration.

I'm going to have to write up a piece on this sometime, my argument is a little too involved for an HN post. But the gist is that the heart of what a fully trained painter does is make personal choices. A quick and dirty example of the difference:

Suppose you train an AI on Picasso's pre-1901 pieces. It's not going to decide it's time for a blue period.

That’s because the entire corpus of art is stored in the neural network weightings as memory. It’s built to imitate human art by optimizing towards these weightings.
There is something about clouds. Facial recognition software frequently finds faces in clouds just like we do.

This thing will kill art dead.

Check my post prior to this one. Art will be fine.

Well, representational art. I'd like to say post-abstract expressionists are at risk, but they still have their admirers convinced they're wearing clothes, and there's no indication those idiots will ever change their minds.

I agree with your post. Good luck convincing every child with a big Dall-E button on their iPad that they are not in fact Picasso. I mean, just look at this thread. And these are supposed to be adults.
Yeah I just tried google image searching to find something like the pikachu photo from https://mobile.twitter.com/gottapatchemall/status/1511777860...

But I can't find anything close to the realism that DALL-E 2 achieved here.

There was an abomination of a live action Pikachu movie some time ago. When I google "realistic pikachu" I get images exactly like this from the movie but not gross.

In fact this photo is exactly what you get when you photoshop the face of an ugly chihuahua unto a Pikachu plushie head and add a yellow brushed hamster body. And a cape. Literally that is what you're looking at.

It understood your prompt and amalgamated the right source photos into this nightmare fuel. Jesus wept.

Yeah, it's still impressive to be able to imitate those styles and add a blue cape that didn't exist in the movies, along with chihuahua eyes. It also appears to be higher definition than Detective Pikachu CG. I'm curious if you could do the same for all 150 original Pokemon, even those for which realistic CG representations don't exist. Would it be able to take the cartoon version of Farfetch'd or Psyduck or a more obscure one and achieve the same realism, without the reference from the deep dataset?
Well to my eye it's realism beyond anything that I could find. Mind you I didn't search for that long so there might be something there if I was to delve deeper.

I am pretty familiar with photoshop, and while I'm not an expert, I would find making something like this really difficult. Anything is possible with photoshop, but some things are very hard.

> In fact this photo is exactly what you get when you photoshop the face of an ugly chihuahua unto a Pikachu plushie head and add a yellow brushed hamster body. And a cape. Literally that is what you're looking at.

i guess some people are overhyped, but it's cool that this can do that. Previously, it took a trained human.

If this is the exact image you wanted and are entirely satisfied for it, great. But what people are reacting to is that it is outputting interesting images at all.

What are you going to do with this cape wearing realistic Pikachu that is actually a picture of a hamster?

Typically the trained human has something specific in mind. And if the client isn't satisfied they will torture them with countless requests for adjustments. So right now this is of limited use.

To me what is far far far more interesting is that Dall-E possibly understands the concept of what a Pikachu is supposed to be. That is downright creepy, and fascinating. I suspect that this visual aspect to things after people get over the clipart generation might find more functional utility as a way to see through the "model eyes" so to speak. To visualize the model itself. That could unlock a lot of doors in how training is done.

Maybe in the future you could train it on textbooks and prompt it for a picture of a molecule. Now that would be something. Especially if you start feeding it data from experiments.

I don't want to be dismissive of Dall-E itself or its authors. Just the implications that this changes everything or how it is much more than it really is.

https://twitter.com/nickcammarata/status/1512123067803344899...

Prompt: "expressive painting of a man shining rays of justice and transparency on a blue bird twitter logo"

You have to break the concepts up apart (which is one of the things Dall-E improved on).

As such: "expressive blue bird"

In google image search, type clipart, and I even get pill tags to further narrow it down to illustrations for animal paintings and so forth. Google's classifier knows the concept of a "blue bird" and expressionism too.

https://www.google.com/search?q=expressive+blue+bird&tbm=isc...

The same for "ray of light". In fact the top results there I get pngs of sun beams on a transparent background. Which is perfect.

Neither the birds nor the rays of light in the pictures it produced are truly its own creations but lifted from bits of pictures in its training set. I bet you could find the exact bird from the second row online in many places for example. It just won't be blue or stylized.

Composite those things together manually and add a style transfer you'll get similar results to DALL-E as that is what it is doing more or less.

> Composite those things together manually and add a style transfer you'll get similar results to DALL-E as that is what it is doing more or less.

If you try actually doing this it will be trivial to see that this assertion is incorrect.

1. The way in which the elements of the images are integrated together is deeper than the level of style. For instance, see the image in the top row, second column: it has integrated the blue bird wings onto the man, not only simply grafting them on, but giving the appearance of their being draped on like a cloak, partly behind and partly in front of him (+ it's consistent with the man's posture and the rays of light to evoke a certain coherent cultural idea/image). You might be able to integrate multiple images (of man, bird, rays etc.) together and style transfer to arrive at a poor approximation of this—but even then, the decision to place the elements together in such a way would require creativity on your part.

2. The one example set of of trial images (generated from the phrase "expressive painting of a man shining rays of justice and transparency on a blue bird twitter logo") is one of the easiest among the full group to pick its various elements apart; if you try this thought experiment with the others in the thread, you'll see this idea is by far insufficient.

Good, finally. Yes, exactly - this is the most interesting aspect of the whole thing.

> the decision to place the elements together in such a way would require creativity on your part

I strongly suspect that's because it found similar compositions in its training set. So what exactly is going on here is fascinating.

Did it learn compositing? Is that why the image output is now much more stable? Or is it mearly finding similar artwork and competently recreating/mimicking existing compositions from different building blocks? So now we can not only transfer styles but also transfer compositions. That could be the beginning of something useful. Instead of a text prompt I'd give it my crappy doodle and it will respond with an improved/different one that is comparable (also a great way to steal tho).

And of course I picked the one that is easiest to tease apart where it is most evident so people will see what I mean.

> if you try this thought experiment with the others in the thread, you'll see this idea is by far insufficient

That depends on your imagination and your artistic eye I guess. Even if somebody could do that they certainly couldn't make you believe them. That's the accomplishment.

Neither one of us can prove it one way or the other so long as the model is a black box. And certainly so long as we don't have direct access to openai but just to curated examples.

On (2), so this part is where I wonder: no-one has "expressive painting of a man shining rays of justice and transparency on a blue bird twitter logo" as their twitter bio. So are the "happy sisyphus" images generated from "happy sisyphus children's style", or are they generated from something more like "a person carries a large ball in a mellow image in the style of a pixar cartoon"? To me there is a huge difference between these things: how much of the context is inferred from the bio, and how much from what's provided in the prompt? (Does DALL-E 2 know about the story of Sisyphus or is that part filled in?)
In the video accompanying the paper they gave the example of "tree bark". Do we mean the bark of a tree or a dog barking at a tree?

So I reckon with "happy sisyphus" it breaks it apart into discrete vectors as a first disambiguation step and in this case resulting in two distinct queries.

Happy returns all kinds of image results.

Sisyphus returns the same kind of image results over and over.

A man rolling a boulder up a hill. Thus it can learn the concept of "sisyphus" on the fly as it would return:

man 95% boulder 90% hill 80% etc

Over a range of images.

So it must be Man+Boulder+Hill. That's its scene cue. That's what CLIP doodles initially. That's the "find me similar images step".

Happy is the style cue.

That's how "happy sisyphus" expanded into "a person carries a large ball in a mellow image in the style of a pixar cartoon"

Why specifically the Pixar style? One of several variations it tried, selected by a human.

The thing we don't know is whether the Pixar styled image is composited from the existing images in its training set. In other words whether this can be reversed.

That character looks familiar tho. I think it is plagiarizing.

Here is another observation: the boulder is not round, it reminds me of one of the Platonic solids. I don't think that's a coincidence, heh.

They are generated from e.g. "happy sisyphus". My understanding is there are separate additional controls for style (though it's flexible enough you could give hints in the text, too, which is I gather where the "expressive" word fits in).
I think your last line is what stands out more than anything. You've just described creating something without "compositing those things together manually."

Note that in that example the "twitter bird logo" is actually expressed in 6 out of all of those images. Look for the small bird, that looks like the Twitter logo. It's there. It's doing the thing.

The prompt is actually "blue bird twitter logo".

Nothing is expressed. Find yourself a blue bird in an expressionistic style, go to google image search and give it the url. Click on tools -> visually similar.

Enjoy an endless supply of things to plagiarize. In the middle picture of the second row you can clearly see how several pre-existing images are sharply cut off before being re-blended.

Same thing going on here as in your other comments.

Tech like CLIP, GPT-3, DALL-E, etc. are indeed nearing the sophistication (w. caveats around outliers and harmful outputs) of Google search.

It took a lot of people to create Google search. It took precisely one training run for DALL-E 2 to create this.

edit: Removed toxic comment.

> Prompt: "expressive painting of a man shining rays of justice and transparency on a blue bird twitter logo"

Yeah, weak results. None of the men look anything like Elon Musk. /s

I've coincidentally just been watching Rick and Morty and this really fit read in Rick's voice.

Is yawning at everything astonishing not just exhausting? Everything is "just" made up of less impressive things. But is this really not worthy of a little wonderment?

Middle-brow dismissal of everything seems so exhausting. I find it much easier to be moderately impressed by things, personally!
It is more like uni-brow dismissal of the eagerly impressed rather than the impressive work in progress.

It is very good to keep an open mind, just make sure your brain doesn't fall out.

I dunno, I really like the "happy sisyphus" one and I'm not seeing anything remotely as nice (or similar really) on Google Images[1]...

[1] https://www.google.com/search?q=happy+sisyphus

Side note: Those last 3 lines would make fantastic lyrics
:)

They call me the Hip-Hopapotamus

My lyrics are bottomless

A proposition for your consideration: what if you’re wrong?
What if you're in a cult?

Two more papers down the line who knows what Dall-E 4 will be capable of. It is a step in the right direction that the image output is now "stable", which is what this is demonstrating.

But it can't read your mind despite the eerie feeling you get, that is an illusion. Kismet in api form.

The next steps is to open this black box up and actually make its internal pipeline tweak able so it can become a useful tool.

It may end up an amazing super useful tool or a clipart plagiarisor/generator on steroids.

You can't even use it yet and you're already so eager to believe.

You don’t even know what I believe, but one thing is clear: you also haven’t used it yet, and are far more certain of its capabilities than I am. (I have, incidentally, had two of my personal requests generated by the kind folks at OpenAI, and I was impressed.)
I was responding to this: "They weren’t just copying/pasting prompts there was human creativity involved as well"

I'm simply certain that whatever its capabilities they are short of mind reading. You'd be equally impressed if you asked me to perform a google image search.

That does not mean that Dall-E is unimpressive or the results are fake. What I'm saying is that the hype and mysticism around this is unwarranted.

Elsewhere in the thread somebody else wrote that we are on the cusp of it producing convincing fake footage from the Kennedy assassination from a single text prompt.

The image output now being stable and pleasing to the eye is enough of a result even if it requires trial and error.

You wouldn't lose your mind over a wallpaper generator even though no machine learning is necessary to produce infinite variations of interesting patterns. This thing is spewing out "art" and people are ascribing magical capabilities to it as if it taped a banana to a canvas.

Anything is possible. Maybe Dall-E is capable of even more incredible things. Who knows where this all ends up. Sure. But not quite that much follows from what has been presented so far.

Based on what I have seen DALL-E 2 does seem to be demonstrating something very close, if not entirely mappable to, human creativity when it comes to visual creation. There are several examples where it makes connections that are both highly unlikely to be just a lift from another work, but yet also create a work that makes a fundamental artistic statement. Here are two that blew my mind (again: presuming these aren't just cribbed from human artists in terms of semantics): https://twitter.com/gfodor/status/1511907134761361419
A wallpaper generator could be a rad application of this, actually. You could feed some random poetry into gpt and the outputs of gpt into the input of this, randomly pick an output, and everytime you login to your computer some surreal, never before seen image.

I'd dig it.

It’s a guy not a group.