|
|
|
|
|
by andrewfong
839 days ago
|
|
My understanding of this tech is pretty minimal, so please bear with me, but is the basic idea is something like this? Before: Evaluate the image in a little region around each pixel against the prompt as a whole -- e.g. how well does a little 10x10 chunk of pixels map to a prompt about a "red sphere and blue cube". This is problematic because maybe all the pixels are red but you can't "see" whether it's the sphere or the cube. After: Evaluate the image as a whole against chunks of the prompt. So now we're looking at a room, and then we patch in (layer?) a "red sphere" and then do it again with a "blue cube". Is that roughly the idea? |
|