Hacker News new | ask | show | jobs
by kuboble 1420 days ago
For what is worth, dall-e is great for exploring but it's nowhere near to being able to deliver a particular image you might have in your mind.

I wanted a very particular, well defined scene:

- A pig and a donkey play poker at the poker table. - The pig is using a computer while playing and we can see the screen of the pig. - The pig must look like a pig - The donkey must look like a donkey - The cards and chips must look like chips and cards

The dall-e simply can't deliver. Nothing is even remotely close to what I want. The best things I came up with after dozens of attempts (I bought extra credits) is something like this: https://i.gyazo.com/4bec0651b78f29a45c291a7f48f468e4.jpg

Kinda there, but the pig doesn't look like a pig or a donkey doesn't look like a donkey, or it's not a pig that has a computer and the cards and chips never look like cards and chips.

So in short - nobody is losing their jobs yet I think.

5 comments

Have you tried creating it in multiple steps, using the "Edit" button? You can erase the parts of the image you want to change, and you can even change the prompt at each step.

If the pig or donkey doesn't look right, you could erase just that part of the image using the same prompt to get a different look.

For example, to create the image you want, I would:

1. Start with the basic prompt: "a pig and donkey playing poker"

2. Generate random variations of my favourite image from that to see how far I can get from that.

3. Edit as necessary with the same prompt to get the right look for the pig/donkey.

4. Erase a section of the image next to the pig and use a prompt like "pig using a laptop" to get DALL-E to generate a laptop in that position.

Yes, I have tried a lot, and still haven't gotten close to the desired end-effect.

I maybe want to shift my claim. I am not sure that it's impossible to create this particular image but that it's almost certainly cheaper to hire someone to draw the exact image I have in mind.

I think there is also a new proffesion comming: a DALL-E prompter job.

> I think there is also a new profession coming: a DALL-E prompter job.

Exactly, except we call this job "Artist" or "Programmer".

Whenever something like this comes along and people decry that it will "replace artists" or "replace programmers"... someone needs to generate the inputs to get what they want. Nothing helps solve the "But I know what I mean" problem. Either it's not good enough to do "general purpose" tasks, or it is, but it needs coaxing and someone who understands interacting with the systems well enough to get the desired output.

I agree with all you say with the exception that it is very distinct from being a programmer or an artist like a painter or graphical designer.

As a programmer I love that when I type [i*i for i in range(10)] I can predict the output and that the output will always be the same. I get frustrated if the same action produces unexpected and non-reproducible results.

Good Dall-e prompter is more like a guide who can navigate through the unknowns. He knows how to use seemingly meaningless words to manipulate the beast. I think it's some form of art and at the same time like being a technician of a complex machinery or wild animal trainer.

These AI created images may not be a replacement for bespoke illustration or photography, but if the choice is between stock images and DALL-E, many people would prefer a DALL-E image that fits closer to what they want than what they may find by searching a stock image website.
I suspect this is where an API and additional cost reductions will move the needle even before we improve the models themselves (which seems to be coming at a rapid pace right now). I can see a scenario like this working well in the future:

1. Get close via prompt debugging to what you want (effectively where you are now)

2. Run an image generation pipeline that creates 10,000 images or an infinite stream

3. Run each image through an 'image to text' step for vector similarity filtering

4. Take images that have very similar 'image to text' similarity scores to the original prompt and present to the user.

Once we can run models of this quality locally, it can even be a job that runs overnight and you wake up in the morning to a set of results to look at.

It has a hard time with the computer, but without, the results are almost usable:

https://imgur.com/a/lVqmnz3

Chances are that someone with prompt engineering experience could get it to produce the desired output with some more poking and prodding.

It'll certainly raise the lower-end bar for custom illustrations/stock footage.

I see what you're getting at, yet the result is still amazing.