| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by elil17 39 days ago

I wonder whether this could be used to fine-tune image models to provide better outputs. Something like this:

1. Algorithmically generate a underdrawing (e.g. place numbers and shapes randomly in the underdrawing)

2. Algorithmically generate a description of the underdrawing (e.g. for each shape, output text like "there is a square with the number three in the top left corner). You might fuzz this by having an LLM rewrite the descriptions in a variety of ways.

3. Generate a "ground truth" image using the underdrawing and an image+text-to-image model.

4. Use the generated description and the generated "ground truth" image as training data for a text-to-image model.

2 comments

vunderba 39 days ago

This is closer to a world model - kind of similar to how one might use a realistic or semi‑realistic simulation engine to model the environment like GTA in order to train a self-driving model.

link

hirako2000 39 days ago

That would complexity the architecture of a model, to solve a finite set of cases. That's an argument for specialised/fine tuned models though.

link