Hacker News new | ask | show | jobs
by jbhuang0604 989 days ago
Yes, you can expand the rich text information into a long sentence. We call this full-text in the paper. The issue of using "full-text" is that it's hard to edit the image interactively. Every time you change the text, you get an entirely different image.
1 comments

I am sorry, but I really don't understand.

With the same seed, and an extremely similar prompt, why would you get an entirely different image?

If I take seed 9999999 (just example) and my prompts are

(1) "very large gothic church at dusk, spooky, horror, red roses" and

(2) "very large gothic church at dusk, spooky, horror, white roses"

then with all models I tested over the last year or so, you get _very_ similar images, with different colored roses, and (at most) very minor changes eleswhere. this only seems to work if you keep in mind the prompt being parsed left to right, so changes further to the beginning of the prompt have larger effects. Again, of course, you need the same seed.

But, with this said, why would that be any different with plain/full/rich text. Apologies if I am somehow blinkered and asking something really obvious.

Yup, it could be similar, but it mostly only works for very simple prompts (e.g., one subject in the image).

For example, in Figure 11 of the paper (https://arxiv.org/pdf/2304.06720.pdf), you can see that full-text "rustic cabin -> rustic orange cabin" does not turn the cabin orange.

For coloring, the core benefit of our method is that it allows precise color control. For example, it can generate colors with rare names (e.g., Plum Purple or Dodger Blue) or even particular RGB triplets that we cannot describe well with texts.

You can examples in Figure 4 here: https://arxiv.org/pdf/2304.06720.pdf