|
|
|
|
|
by elil17
39 days ago
|
|
I wonder whether this could be used to fine-tune image models to provide better outputs. Something like this: 1. Algorithmically generate a underdrawing (e.g. place numbers and shapes randomly in the underdrawing) 2. Algorithmically generate a description of the underdrawing (e.g. for each shape, output text like "there is a square with the number three in the top left corner). You might fuzz this by having an LLM rewrite the descriptions in a variety of ways. 3. Generate a "ground truth" image using the underdrawing and an image+text-to-image model. 4. Use the generated description and the generated "ground truth" image as training data for a text-to-image model. |
|