https://github.com/TonyLianLong/LLM-groundedDiffusion
And, indeed, someone has:
https://github.com/sayakpaul/caption-upsampling