Hacker News new | ask | show | jobs
by es7 1191 days ago
I tried this out today and was very impressed by the results.

Remember when DALLE came out less than a year ago and people were amazed by the avocado armchair?

Between this and Midjourney v5, the quality of AI generated art is rapidly approaching human level and I can see it getting there very soon.

3 comments

Midjourney, DALL-E, and many of these generative AIs have reached human-level for a long time.

The problem is - unlike a human - it's pretty hard to get them to do something close to what you have in mind. Sure, if you try a few dozen prompts - you'll probably eventually get something close to what you want.

And considering the cost of this will approach free - it's going to be hard for artists to compete.

I tried getting Midjourney to generate an image of a boy doing a high jump - and no matter what I tried - the boy is hurdling over the bar rather than high jumping over it.

The quality of the images is great - human-level. But it's not what I want.

I think we'll be stuck in this phase for a very long time, like self-driving cars.

Have you tried Stable Diffusion + controlnet? That gives you lot of control over the output. You can generate a simple shape in blender, expert it’s depth map, then feed your model. Or draw a sketch, or use a normal map, or a combination of all of this.

I got some great results today when experimenting with it: https://twitter.com/dgellow/status/1638496715056480259?s=20

I reckon we’ll move out of the prompt difficulty phase you mention simply when the context window gets big enough.

If you were able to give midjourney a short textual instruction, a hand drawn sketch and a reference image from a human artist all together as a prompt then I’m pretty sure it could produce the image of a boy doing a high jump as you intent.

We already see extended length multimedia prompts in GPT4 so it’s doesn’t seem like an impossible leap for midjourney/DALL-E etc

Midjourney already allows this - sort of - with image remixing.

From everything I tried, the results were worse.

Again, I think this is going to remain a problem for a long time - but it will probably improve slightly with each iteration. Either way there's so many use cases where the cost-benefit will massively favor AI generated art, and I think the % of cases will continue to increase - albeit slowly.

Similar to self-driving cars - they've been in limited availability in Phoenix for a long time, and now SF. The list of cities will grow, and the limitations will decrease - but I still can't see the vast majority of trips being self-driven within the next 20 years.

In the same way, I don't see AI generating the vast majority of Pixar films in 20 years. Nor AI generating Marvel comic strips or kids cartoons. Etc.

Sure - some people will be using it for these use cases. They already are, and were before GPT.

I don't see this killing jobs, but limiting job growth instead.

You can already give MJ a reference image, just by putting the URL of the image as the first thing after the imagine prompt and before the text description
I agree, it's impressive. But still not at the level of useful. For example, I would never use any of these generative art images in company ads or marketing materials. They're not in the uncanny valley, but closer to that than something one would commission from a designer.
The main limitation on it now is not in the generation, but the interface. Verbal prompts are fine if you really don't know what you want, but they just give you a generic output. Going I2I or anything of that sort, you're making or borrowing human art and then asking the AI "could you make it pretty for me?"

It's a question of what information the image actually encodes. The part that "tells a thousand words" in an illustrative sense, you still have to make.