Hacker News new | ask | show | jobs
by loxias 618 days ago
It's quite good at following a detailed paragraph long description of an scene, which is a double edged sword. A lot of the fun for me with early text to image models was underspecifying an image and then enjoying how the model "invents" it. "Steampunk spaceship", "communist bear", "glass city".

flux is amazing, but I find it requires a very literal description, which pushes the "creative work" back to the text itself. Which can certainly be a good thing, just a bit less gratifying to non visual types like myself. :)

I wonder, only somewhat jokingly, if one could make text generators which "imagine" detailed fantastical scenes, suitable for feeding to a text to image model.

2 comments

That's what Fooocus is - it allows you to specify a "text expander" LLM that sits in between the input prompt and the diffusion model.

https://github.com/lllyasviel/Fooocus

Prompt enhancement is now a standard feature in many image generation tools.