Hacker News new | ask | show | jobs
by vessenes 622 days ago
Flux is so frustrating to me. Really good prompt adherence, strong ability to keep track of multiple parts of a scene, it's technically very impressive. However it seems to have had no training on art-art. I can't get it to generate even something that looks like Degas, for instance. And, I can't even fine tune a painterly art style of any sort into Flux dev. I get that there was working, living artist backlash at SD and I can therefore imagine that the BFL team has decided not to train on art, but, it's a real loss. Both in terms of human knowledge of, say composition, emotion, and so on, but also for style diversity.

For goodness sake, the MET in New York has a massive trove of open CC0 type licensed art. Dear BFL, please ease up a bit on this, and add some art-art to your models, they will be better as a result.

9 comments

I've had a similar experience, incredible at generating a very specific style of image, but not great at generating anything with a specific style.

I suspect we'll see the answer to this is LoRAs. Two examples that stick out are:

- Flux Tarot v1 [0]

- Flux Amateur Photography [1]

Both of these do a great job of combining all the benefits of Flux with custom styles that seem to work quite well.

[0] https://huggingface.co/multimodalart/flux-tarot-v1 [1] https://civitai.com/models/652699?modelVersionId=756149

I like those, and there's an electroshock lora that's just awesome out there. That said, Tarot and others like it are "illustrator" type styles with extra juice. I have not successfully trained a LoRa for any painting style, Flux does not seem to know about painting.
I'm curious to give this a go. I've been training a lot of LoRAs for FLUX dev recently (purely for fun). I'm sure there must be a way to get this working.

Here are a few I've recently trained: https://civitai.com/user/dvyio

This looks really good! What is your process to get this kind of high quality LoRAs?
Thank you!

A reasonable amount of training images (50 or so), and then I train for 2,000-ish steps for a new style.

Many of them work well with Flux, particularly if they're illustration-based. Some don't seem to work at all, so I didn't upload those!

How long does this take, and on what equipment? It's amazing to me that you can do this from just 50 images, I would have thought tens of thousands.
@davidbarker -- please do, that sounds awesome! I did not have good results.
It's trickier than I thought it would be.

Here are a few in Degar style I made after training for 2,500 steps. I'd love to hear what you think of them. To my (untrained) eye, they seem a little too defined, perhaps?

https://imgur.com/a/sqsQLPg

Yep absolutely nothing like degas well I take that back. I think it picked up some favorite colors/tones. But it has no concept of the materials or poses or composition. So plasticky! Compare to https://images.app.goo.gl/JiDRYNNKUP9tczkQ7
I suspect it really needs more training examples. The problem I found when I looked for images to use was that 60% were of dancers, and from past experience, it will end up trying to fit a dancer into every image you create. But of course, there are only a (small) finite number of Degas images that you can train with.

A possible solution may be to incorporate artificial images in the training data. So, create an initial LoRA with the original Degas images and generate 500 images. From those generated images, pick the ones that most resemble Degas. Add those to the training set and train again. Repeat until (hopefully) it learns the correct style.

Out of curiosity, what do you think of these? https://imgur.com/a/8p7RlMe
>However it seems to have had no training on art-art. I can't get it to generate even something that looks like Degas, for instance

It feels like they just removed names from the datasets to make it worse at recreating famous people and artists.

No, they absolutely did not just do that in this case, although that was the SD plan. If you prompt for "painterly, oil painting, thick brush strokes, impressionistic oil painting style" to flux, you will get ... anime-ish renderings.
That's not what I'm talking about, SDXL you can literally prompt a famous artists entire style and mix and match them, even conceptual artists and sculptors.
I’ve had the same problem with photography styles, even though the photographer I’m going for is Prokudin-Gorskii who used emulsion plates in the 1910s and the entire Library of Congress collection is in the public domain. I’m curious how they even managed to remove them from the training data since the entire LoC is such an easy dataset to access.
Yes, exactly. I think they purposely did not train on stuff like this. I'd bet that you could do a LoRa of Prokudin-Gorskii though; there's a lot of photographic content in flux's training set.
i'm fairly confident they did a broad FirstName LastName removal.
And I can't imagine there's a real copyright (or ethical) issue with including artwork in the public domain because the artist died over a century ago.
I think that's part of what makes FLUX.1 so good: the content it's trained on is very similar.

Diversity is a double-edged sword. It's a desirable feature where you want it, and an undesirable feature everywhere else. If you want an impressionist painting, then it's good to have Monet and Degas in the training corpus. On the other hand, if you want a photograph of water lilies, then it's good to keep Monet out of the training data.

DALL-E3 doesn't struggle with this. It's just opinions. There's no technical limitation. They chose to weaken the model in this regard.
Nonsense. FLUX.1-dev is famous for its consistency, prompt adherence, etc.; and it fits on a consumer GPU. That has to come with compromises. You can call any optimization weakness: that's the nature of compromise.
I wonder if part of the reason it's good is because it's been trained for a more specific task. I can only imagine that if your concept of a "house" includes range from a stately home to "a pineapple under the sea" you're going to end up with a very generalised concept. It's then takes specific prompting to remove the influences you're not interested in.

I suspect the same goes for art styles. There's such huge variety that really they'd be better surveys by separate models.

There are people who undistilled Flux so it can be further finetuned, so adding art training won't be an issue.

https://huggingface.co/nyanko7/flux-dev-de-distill

I wonder if you can use Flux to generate the base image then img2img on SD1.4 to impart artistic style?
That's what a refiner is for in auto1111. Taking an image the last 10% and touching it up with an alternative model.

I actually use flux to generate image for purposes of adherence, then pull it in as a canny/depth controlnet with more established models like realvis, unstableXL, etc.

That is an interesting idea, I somehow hadn't thought of using flux in a chain like that, thanks!
Yes, that is my current workflow as well.
>but, it's a real loss. Both in terms of human knowledge of, say composition, emotion, and so on, but also for style diversity

But that real art still exists, and can still be found, so what exactly is the loss here?

We may differ on our take about the usefulness of diffusion models, but I'd say it's a loss in that many of the visuals humans will see in the next ten years are going to be generated by these models, and I for one wish they weren't just trained on weeb shit.
Just think that before 1995 (and in reality, decades later than that) most of the world would never have access to 99% of the worlds art.

And between 1995 and 2022 the amount of Art produced surpasses the cumulative output of all other periods of human history.

... And between 2022 and 2025 the amount of imagery generated will drive the percent of Art created to roughly 0% of all imagery.
You'll still be able to ask a person to create art in a specific style if you'd like.
Unfortunately we will have a generation of young artists who learn to draw based on models like flux, unless they get classical training..