|
|
|
|
|
by yk
442 days ago
|
|
Tried Flux.dev with the same prompts [0] and it seems actually to be a GPT problem. Could be that in GPT the text encoder understands the prompt better and just generates the implied IP, or could be that a diffusion model is just inherently less prone to overfitting than a multimodal transformer model. [0] https://imgur.com/a/wqrBGRF Image captions are the impled IP, I copied the prompts from the blog post. |
|