| Thanks a lot for the comment! (one of the authors here) RE: plaintext
- The "plain-text" result is just a baseline. We call the "plaintext parentheses in the prompt" full-text (i.e., expanding the rich text info into a long sentence). We show many "full-text" in the paper https://arxiv.org/pdf/2304.06720.pdf.
You can see in Fig 11 that full-text results cannot change the color, style, and do not respect the description. More examples in Figure 13, 14, and 15. The main issue of using full-text is that it cannot preserve the original plain text image, thereby requiring many rounds of prompt tuning/engineering. We also compared with two other image editing methods, Prompt-to-prompt and InstructPix2Pix. But they could not handle localized editing well. You can see some example comparisons for Color (Figure 4), Style (Figure 5), Footnote (Figure 8), and Font Size (Figure 9). https://arxiv.org/pdf/2304.06720.pdf RE: Style
- Yes, you can specify what styles you want by just describing it. It could be a simple word like "Ukiyo-e" or "Van Gogh", or some detailed descriptions. You can check out some examples in the video: https://youtu.be/ihDbAUh0LXk?si=wVF9LIF1NVqLtNDC&t=59 The particular font family used is just a "label". Glad that you posted the comments! I hope this clarifies. Happy to answer any questions. |
Not sure how this could be communicated better.