Hacker News new | ask | show | jobs
by petralithic 292 days ago
Have you actually used image generators today? It can produce things it's never seen if only you describe the constituent pieces. Prompts are a compressed version of the image one wants to create, and these days you don't even need "prompts" per se, you can say, make a woman looking towards the viewer, now add a pearl earing, now adjust this and that etc.
1 comments

> Have you actually used image generators today?

Why would you ask this? It sounds like a lead-up to some kind of put down.

> It can produce things it's never seen if only you describe the constituent pieces.

It can produce things it's never seen based on lots of things that it has seen.

> Prompts are a compressed version of the image one wants to create

They emphatically are not. They are instructions to a tool on what relative importance to assign to all of the templates that it was trained on. But it doesn't understand the output image any more than it understood any of the input images. There is no context available to it in the purest sense of the word. It has no emotion to express because it doesn't have emotions in the first place.

> and these days you don't even need "prompts" per se, you can say, make a woman looking towards the viewer, now add a pearl earing, now adjust this and that etc.

That's just a different path to building up the same prompt. It doesn't suddenly cause the AI to use red for a dress because it thinks it is a nice counterpoint to a flower in a different part of the image because it does not think at all.

I think you're reading too much into my comment. It's not a put down, I'm genuinely asking because it seems many people still think anyone serious about AI just types prompts into Midjourney, but it's become a lot more complex than that, akin to electronic music production; producers haven't played every single note with a physical instrument their synths synthesize yet their arrangement of the notes is what makes them a producer, and so too with AI workflows such as those seen in ComfyUI. If one is not familiar then they might not understand where the field is today.

Regarding prompts, I never said a computer "understands" or is "emotional" about an image, I don't think anyone actually thinks that, on either side of the debate so not sure why you're bringing that up. By "compressed" I just meant in the information theory way, in that if you have a specific series of words, and a given temperature and other settings for a given model, it will deterministically produce the same image, hence the set of those attributes can be thought of as a compressed representation of that image. I made no claims about it thinking whatsoever.

> It can produce things it's never seen based on lots of things that it has seen.

Yes, just like humans, as I had said in my initial comment about the same old arguments being said since 2021 when Stable Diffusion came out. But again that's tiresome so let's not repeat that here too.