One thing that makes FLUX so special is the prompt understanding. I now gave FLUX 1.1 a prompt "Closeup of a doll house built to resemble a famous room in the TV show Friends" and it gave me one with the sign "Central Perk". I never prompted for the text "Central Perk". A Redditor also discovered that it has an associative understanding of emotions. For example "Rose of passion" and it may draw a flower that is burning, because passion is fiery.
This is miles ahead of most other image generation models available today.
Yet, it doesn't seem to know how a Tektronix 4010 actually looks like... ;)
I had similar issues trying to paint a "I cast non-magic missile" meme with a fantasy wizard using a missile launcher. No model out there (I've tried SD, SDXL, FLUX.1dev and now this FLUX1.1pro) knows how a missile launcher looks like (neither as a generic term, nor any specific systems) and even has no clue how it's held, so they all draw really weird contraptions.
I've tried all of those and then some (e.g. "ATGM"), plus various specific names (like "FGM-148 Javelin", "M1 Bazooka", or "RPG-7", which are all quite iconic and well-recognized so I thought some of those may appear in training data) - all no bueno. Models are simply unaware about such devices, best of their "guesses" is that it's a weapon, so they draw something rifle- or pistol-shaped.
And, sure, that's what LoRAs are for. If I can figure out how to train one for FLUX, in a way that would actually produce something meaningful (my pitiful attempts at SDXL LoRA training were... less that stellar, and FLUX is quite different from everything). Although that's probably not worth it for making a meme picture...
That is astoundingly good adherence to the description. I already liked and was impressed by Flux1 but that is perhaps the most impressive image generation I've ever seen.
Also, flux (schnell, dev) can be run on your local machine.
If you really want to use a paid service, Ideogram is probably the best one out there that balances quality with adherence. DALL-E 3 also has good adherence as well though the quality can sometimes be iffy, and it's very puritanical in terms of censorship.
It's quite good at following a detailed paragraph long description of an scene, which is a double edged sword. A lot of the fun for me with early text to image models was underspecifying an image and then enjoying how the model "invents" it. "Steampunk spaceship", "communist bear", "glass city".
flux is amazing, but I find it requires a very literal description, which pushes the "creative work" back to the text itself. Which can certainly be a good thing, just a bit less gratifying to non visual types like myself. :)
I wonder, only somewhat jokingly, if one could make text generators which "imagine" detailed fantastical scenes, suitable for feeding to a text to image model.
This is miles ahead of most other image generation models available today.