Hacker News new | ask | show | jobs
by fassssst 966 days ago
DALL-E within ChatGPT uses GPT-4 to rewrite what you ask for into a good text-to-image prompt. You could probably do something similar with Stable Diffusion with just a little upfront effort tuning that system prompt.
2 comments

Somewhat, but dalle3 is hugely better at understanding a description and relationships.
LLMs in general are, and that can be leveraged by using an LLM to set up layout for Stable Diffusion.

https://github.com/TonyLianLong/LLM-groundedDiffusion

> You could probably do something similar with Stable Diffusion with just a little upfront effort tuning that system prompt.

And, indeed, someone has:

https://github.com/sayakpaul/caption-upsampling