Hacker News new | ask | show | jobs
by jbhuang0604 985 days ago
Thanks a lot for the comment! (one of the authors here)

RE: plaintext - The "plain-text" result is just a baseline. We call the "plaintext parentheses in the prompt" full-text (i.e., expanding the rich text info into a long sentence). We show many "full-text" in the paper https://arxiv.org/pdf/2304.06720.pdf. You can see in Fig 11 that full-text results cannot change the color, style, and do not respect the description. More examples in Figure 13, 14, and 15.

The main issue of using full-text is that it cannot preserve the original plain text image, thereby requiring many rounds of prompt tuning/engineering. We also compared with two other image editing methods, Prompt-to-prompt and InstructPix2Pix. But they could not handle localized editing well. You can see some example comparisons for Color (Figure 4), Style (Figure 5), Footnote (Figure 8), and Font Size (Figure 9). https://arxiv.org/pdf/2304.06720.pdf

RE: Style - Yes, you can specify what styles you want by just describing it.

It could be a simple word like "Ukiyo-e" or "Van Gogh", or some detailed descriptions. You can check out some examples in the video: https://youtu.be/ihDbAUh0LXk?si=wVF9LIF1NVqLtNDC&t=59

The particular font family used is just a "label".

Glad that you posted the comments! I hope this clarifies. Happy to answer any questions.

2 comments

I get now that using the side-by-side "plain text"/"rich text" comparisons you're trying to highlight how similar they are, only differing in the regions that are annotated in the rich-text version. But my first impression was that you're comparing against a weak baseline, which doesn't look so good.

Not sure how this could be communicated better.

Got it! Thanks for the feedback! This is definitely something we can improve.
This!
Very interesting. Is there a way to retrieve the segment information, then surface that in a UI so I can select and regenerate single elements?
Yes, if you go to the huggingface demo: https://huggingface.co/spaces/songweig/rich-text-to-image

You can find the segmentation information on the bottom-right of the rich-text generation result.