| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jbhuang0604 985 days ago

Thanks a lot for the comment! (one of the authors here)

RE: plaintext - The "plain-text" result is just a baseline. We call the "plaintext parentheses in the prompt" full-text (i.e., expanding the rich text info into a long sentence). We show many "full-text" in the paper https://arxiv.org/pdf/2304.06720.pdf. You can see in Fig 11 that full-text results cannot change the color, style, and do not respect the description. More examples in Figure 13, 14, and 15.

The main issue of using full-text is that it cannot preserve the original plain text image, thereby requiring many rounds of prompt tuning/engineering. We also compared with two other image editing methods, Prompt-to-prompt and InstructPix2Pix. But they could not handle localized editing well. You can see some example comparisons for Color (Figure 4), Style (Figure 5), Footnote (Figure 8), and Font Size (Figure 9). https://arxiv.org/pdf/2304.06720.pdf

RE: Style - Yes, you can specify what styles you want by just describing it.

It could be a simple word like "Ukiyo-e" or "Van Gogh", or some detailed descriptions. You can check out some examples in the video: https://youtu.be/ihDbAUh0LXk?si=wVF9LIF1NVqLtNDC&t=59

The particular font family used is just a "label".

Glad that you posted the comments! I hope this clarifies. Happy to answer any questions.

2 comments

yorwba 984 days ago

I get now that using the side-by-side "plain text"/"rich text" comparisons you're trying to highlight how similar they are, only differing in the regions that are annotated in the rich-text version. But my first impression was that you're comparing against a weak baseline, which doesn't look so good.

Not sure how this could be communicated better.

link

jbhuang0604 984 days ago

Got it! Thanks for the feedback! This is definitely something we can improve.

link

simbolit 984 days ago

This!

link

tudorw 984 days ago

Very interesting. Is there a way to retrieve the segment information, then surface that in a UI so I can select and regenerate single elements?

link

jbhuang0604 984 days ago

Yes, if you go to the huggingface demo: https://huggingface.co/spaces/songweig/rich-text-to-image

You can find the segmentation information on the bottom-right of the rich-text generation result.

link