|
|
|
|
|
by alok-g
1387 days ago
|
|
I understand neural networks, embeddings, convolutions, etc. The part that's unclear to me is specifically how textual embeddings are linked into the img-to-img network trying to reduce the noise. In other words, am missing how the process is 'conditioned upon' the text. (I lack a understanding the same for conditional GANs as well.) If the answer is just that the textual embeddings are also fed as simple inputs to the network, I already understand then. |
|
So the model understands (kinda) who Bob Moog is, so when you include "Bob Moog" in the prompt, the model knows what you are looking for.