|
|
|
|
|
by mdp2021
1201 days ago
|
|
So, technically, and beyond that, theoretically, and even anatomically. In the brain, visual processing, left hemisphere was found to contain details; a right hemisphere to contain structural relations. So a whole is composed of elements and relative positions. In Convolutional Neural Networks, "near, direct" layers contain analytic detail and "far, abstract" layers contain synthetic shapes. So, implementation-wise, you can take e.g. descriptions as abstracts and a "pre-acquired" memory of details as «graphic». Edit: About the "combination", well that the whole purpose of this new technology proposal, "ControlNet" - i.e., formerly you may have had some "transformer" from input to output, and now "conditional controls" are added (through a "zero-convolution" technique) - see Adding Conditional Control to Text-to-Image Diffusion Models - https://arxiv.org/abs/2302.05543 |
|