Hacker News new | ask | show | jobs
by behnamoh 1100 days ago
This is insane! But it makes me wonder if we've reached a local maximum in AI where the current methods are great at generating still images but they're pretty much uncontrollable. Like if you ask the AI to generate a dog, is it really possible to prompt every single detail so it creates exactly what you have in mind, or is it more like a trust situation where you just accept whatever the AI generates for you?
4 comments

Pure text->image is impossible to get exactly, given there's 10000 possibilities for a dog. Even if text prompts eliminate 99.9% of probabilities, it there's still 10 possible images.

However, with stuff like controlnet, it's already possible, and will be solved within a year. Yes you can specify every exact detail, but you need to feed it a sketch, or a skeletal pose, or a reference image of the dog...

Also, you can train a LORA on the subject before hand, if you want to consistently regenerate the subject, with just text.

> 10000 possibilities for a dog

I suppose it's not your main point, but that number is off by... probably about 10,000 orders of magnitude.

It's a human problem really. If you asked an artist to draw a dog you'd have to "trust" them? To control every detail you'd have to tell them every detail, either upfront (whereby the artist night struggle to achieve) or as a series of edits. In the latter case both artist and AI would struggle to keep the look consistent the more edits you make.
At some point you might as well just do it yourself cause it’s easier. I’ve gotten to this point more than once with ChatGPT. ChatGPT will get you like 70% of the way initially and if you are lucky you’ll hit 90% with a lot of time invested in “prompt engineering”.

The thing is, none of these are mind readers. And text is a very poor way to define tight specs. The best way for software is to code it yourself. Code is the spec. Same with drawing. The spec is the drawing itself. Only the human can control that.

…or something…

ControlNet begs to differ
https://vcai.mpi-inf.mpg.de/projects/DragGAN/ Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold