| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ilaksh 625 days ago
	Pretty smart model. Here's one I made: https://replicate.com/p/6ez0x8xqvsrga0cjadg8m7bah0

6 comments

jug 625 days ago

One thing that makes FLUX so special is the prompt understanding. I now gave FLUX 1.1 a prompt "Closeup of a doll house built to resemble a famous room in the TV show Friends" and it gave me one with the sign "Central Perk". I never prompted for the text "Central Perk". A Redditor also discovered that it has an associative understanding of emotions. For example "Rose of passion" and it may draw a flower that is burning, because passion is fiery.

This is miles ahead of most other image generation models available today.

link

drdaeman 625 days ago

Yet, it doesn't seem to know how a Tektronix 4010 actually looks like... ;)

I had similar issues trying to paint a "I cast non-magic missile" meme with a fantasy wizard using a missile launcher. No model out there (I've tried SD, SDXL, FLUX.1dev and now this FLUX1.1pro) knows how a missile launcher looks like (neither as a generic term, nor any specific systems) and even has no clue how it's held, so they all draw really weird contraptions.

link

morbicer 625 days ago

Isn't it because the shoulder launched weapon is usually called rocket launcher, rpg or bazooka? Never heard it referred as misille launcher.

link

drdaeman 625 days ago

I've tried all of those and then some (e.g. "ATGM"), plus various specific names (like "FGM-148 Javelin", "M1 Bazooka", or "RPG-7", which are all quite iconic and well-recognized so I thought some of those may appear in training data) - all no bueno. Models are simply unaware about such devices, best of their "guesses" is that it's a weapon, so they draw something rifle- or pistol-shaped.

And, sure, that's what LoRAs are for. If I can figure out how to train one for FLUX, in a way that would actually produce something meaningful (my pitiful attempts at SDXL LoRA training were... less that stellar, and FLUX is quite different from everything). Although that's probably not worth it for making a meme picture...

link

nikcub 625 days ago

I've gone from counting fingers on a hand to keys on a keyboard

link

PcChip 625 days ago

agreed - pretty impressive! https://replicate.com/p/ajfrva4p4hrge0cjaf3bncfwn4

link

loufe 625 days ago

That is astoundingly good adherence to the description. I already liked and was impressed by Flux1 but that is perhaps the most impressive image generation I've ever seen.

link

miohtama 625 days ago

Is it going be able to go head-to-head against Midjourney?

link

vunderba 625 days ago

MJ is by far the worst model for complex prompt ADHERENCE, though it has excellent compositional quality.

Comparisons of similar prompt using Midjourney 6.1

https://imgur.com/a/WBnPl7I

Also, flux (schnell, dev) can be run on your local machine.

If you really want to use a paid service, Ideogram is probably the best one out there that balances quality with adherence. DALL-E 3 also has good adherence as well though the quality can sometimes be iffy, and it's very puritanical in terms of censorship.

link

loxias 625 days ago

It's quite good at following a detailed paragraph long description of an scene, which is a double edged sword. A lot of the fun for me with early text to image models was underspecifying an image and then enjoying how the model "invents" it. "Steampunk spaceship", "communist bear", "glass city".

flux is amazing, but I find it requires a very literal description, which pushes the "creative work" back to the text itself. Which can certainly be a good thing, just a bit less gratifying to non visual types like myself. :)

I wonder, only somewhat jokingly, if one could make text generators which "imagine" detailed fantastical scenes, suitable for feeding to a text to image model.

link

vunderba 625 days ago

That's what Fooocus is - it allows you to specify a "text expander" LLM that sits in between the input prompt and the diffusion model.

https://github.com/lllyasviel/Fooocus

link

ilaksh 625 days ago

Prompt enhancement is now a standard feature in many image generation tools.

link