| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yk 442 days ago
	Tried Flux.dev with the same prompts [0] and it seems actually to be a GPT problem. Could be that in GPT the text encoder understands the prompt better and just generates the implied IP, or could be that a diffusion model is just inherently less prone to overfitting than a multimodal transformer model. [0] https://imgur.com/a/wqrBGRF Image captions are the impled IP, I copied the prompts from the blog post.

1 comments

jsemrau 442 days ago

DALL-E 3 already uses a model that trained on synthetic data that take the prompt and augments it. This might lead to the overfitting. It could also be, and might be the simpler explanation, that its just looks up the right file from a RAG.

link