Hacker News new | ask | show | jobs
by jacquesm 289 days ago
> it's still a human at the end of the day picking what to prompt

I think that 'dutch people skating on a lake' or 'girl with a pearl earring' or 'dutch religious couple in front of their barn' without having an AI trained on various works will produce just noise. And if those particular works (you know the ones, right?) were not part of the input then the AI would never produce anything looking like the original, no matter how specific you made the prompt. It takes human input to animate it, and even then what it produces to me does not look original whereas any five year old is able to produce entirely original works of art, none of which can be reduced to a prompt.

Prompts are instructions, they are settings on a mixer, they are not the music produced by the artists at the microphones.

1 comments

Have you actually used image generators today? It can produce things it's never seen if only you describe the constituent pieces. Prompts are a compressed version of the image one wants to create, and these days you don't even need "prompts" per se, you can say, make a woman looking towards the viewer, now add a pearl earing, now adjust this and that etc.
> Have you actually used image generators today?

Why would you ask this? It sounds like a lead-up to some kind of put down.

> It can produce things it's never seen if only you describe the constituent pieces.

It can produce things it's never seen based on lots of things that it has seen.

> Prompts are a compressed version of the image one wants to create

They emphatically are not. They are instructions to a tool on what relative importance to assign to all of the templates that it was trained on. But it doesn't understand the output image any more than it understood any of the input images. There is no context available to it in the purest sense of the word. It has no emotion to express because it doesn't have emotions in the first place.

> and these days you don't even need "prompts" per se, you can say, make a woman looking towards the viewer, now add a pearl earing, now adjust this and that etc.

That's just a different path to building up the same prompt. It doesn't suddenly cause the AI to use red for a dress because it thinks it is a nice counterpoint to a flower in a different part of the image because it does not think at all.

I think you're reading too much into my comment. It's not a put down, I'm genuinely asking because it seems many people still think anyone serious about AI just types prompts into Midjourney, but it's become a lot more complex than that, akin to electronic music production; producers haven't played every single note with a physical instrument their synths synthesize yet their arrangement of the notes is what makes them a producer, and so too with AI workflows such as those seen in ComfyUI. If one is not familiar then they might not understand where the field is today.

Regarding prompts, I never said a computer "understands" or is "emotional" about an image, I don't think anyone actually thinks that, on either side of the debate so not sure why you're bringing that up. By "compressed" I just meant in the information theory way, in that if you have a specific series of words, and a given temperature and other settings for a given model, it will deterministically produce the same image, hence the set of those attributes can be thought of as a compressed representation of that image. I made no claims about it thinking whatsoever.

> It can produce things it's never seen based on lots of things that it has seen.

Yes, just like humans, as I had said in my initial comment about the same old arguments being said since 2021 when Stable Diffusion came out. But again that's tiresome so let's not repeat that here too.