Hacker News new | ask | show | jobs
by onlyrealcuzzo 1191 days ago
Midjourney, DALL-E, and many of these generative AIs have reached human-level for a long time.

The problem is - unlike a human - it's pretty hard to get them to do something close to what you have in mind. Sure, if you try a few dozen prompts - you'll probably eventually get something close to what you want.

And considering the cost of this will approach free - it's going to be hard for artists to compete.

I tried getting Midjourney to generate an image of a boy doing a high jump - and no matter what I tried - the boy is hurdling over the bar rather than high jumping over it.

The quality of the images is great - human-level. But it's not what I want.

I think we'll be stuck in this phase for a very long time, like self-driving cars.

2 comments

Have you tried Stable Diffusion + controlnet? That gives you lot of control over the output. You can generate a simple shape in blender, expert it’s depth map, then feed your model. Or draw a sketch, or use a normal map, or a combination of all of this.

I got some great results today when experimenting with it: https://twitter.com/dgellow/status/1638496715056480259?s=20

I reckon we’ll move out of the prompt difficulty phase you mention simply when the context window gets big enough.

If you were able to give midjourney a short textual instruction, a hand drawn sketch and a reference image from a human artist all together as a prompt then I’m pretty sure it could produce the image of a boy doing a high jump as you intent.

We already see extended length multimedia prompts in GPT4 so it’s doesn’t seem like an impossible leap for midjourney/DALL-E etc

Midjourney already allows this - sort of - with image remixing.

From everything I tried, the results were worse.

Again, I think this is going to remain a problem for a long time - but it will probably improve slightly with each iteration. Either way there's so many use cases where the cost-benefit will massively favor AI generated art, and I think the % of cases will continue to increase - albeit slowly.

Similar to self-driving cars - they've been in limited availability in Phoenix for a long time, and now SF. The list of cities will grow, and the limitations will decrease - but I still can't see the vast majority of trips being self-driven within the next 20 years.

In the same way, I don't see AI generating the vast majority of Pixar films in 20 years. Nor AI generating Marvel comic strips or kids cartoons. Etc.

Sure - some people will be using it for these use cases. They already are, and were before GPT.

I don't see this killing jobs, but limiting job growth instead.

You can already give MJ a reference image, just by putting the URL of the image as the first thing after the imagine prompt and before the text description