Hacker News new | ask | show | jobs
by blueblisters 823 days ago
SD3.0 looks really good and will probably displace a lot of closed image generation models once it's out.

I don't understand why Stability gets so little support from the community. They released the first usable open-source models and their models are the foundation of the most interesting AI-bashing workflows out there - VC funded or otherwise.

3 comments

Stability AI had massive support when Stable Diffusion 1.x and SDXL each hit the scene. But the Gen AI industry has evolved so rapidly that forks and iterations on the models outpaced them, especially commercially.

That's a feature of open-source development, not a bug. But it's a reason (along with the general financial issues which are the company's fault alone) why Stability is switching to a "need a membership to use commercially" business model, and IMO it won't work.

its a (minor, not permanent) blow to open source ai that stability, and now mistral, felt the need to switch to closed for their leading models. openwashing is a crowdpleaser but everyone goes closed when things get serious. basically only meta is the leading hope. i hope more players emerge - but i also dont have a ton of ideas on how to fund them. these efforts take serious resources.

that said i think its impt to acknowledge how much stability has shared in its research, just the other day they were on HN for Stable Video 3D, not to mention hourglass diffusion and other Stable* models. may not be the overwhelming SOTA but its real open source AI work that pushes the frontiers. you have to give them credit for that.

> these efforts take serious resources

Meta just published their new optimization results [1]. According to them

  > training a 7B model on 512 GPUs to 2T tokens using this method would take just under two weeks.
In this context a GPU is an NVIDIA A100, which you can buy, if you can buy, for $10000.

And this is after an explosion of ideas that lead to unthinkable optimizations just two years ago.

If someone did train such a model 2 years ago, it would have cost hundreds of millions. Now it's 5 million. Maybe in 2 years it's going to be only $50k. Should you start a startup now and invest $5 million, an risk someone stealing the show for pennies in 2 years? If you do, I really can't see if you can afford to open source the results of your training.

[1] training a 7B model on 512 GPUs to 2T tokens using this method would take just under two weeks.

There is nothing you can run on your computer that is even remotely as good as stability products.

Which means, there is nothing with even remotely the same fine tuning ecosystem.

And for that - stability is way ahead of the competition.

Yeah. It is weird. On one hand, v3 looks really good, the arch is sound, and the work is solid. On the other hand, selling Clipdrop is not a good sign (a growing startup divest is uncommon). Emad likes to talk about how they were able to retain researchers and now this, so it is hard to know what's going on.
The CEO being a fraud could be a possible reason. Just saying.
You shouldn't judge people by what Forbes says, but what people do. So far, Emad delivers on what he promises (with some delays here and there), and Stability.ai is the only one that publish image / video generation models at production quality.
Not just Forbes. Read about him on Wikipedia.
>However, according to him, he did not attend his graduation ceremony to receive his degrees, and therefore, he does not technically possess a BA or an MA.

Oh wow, he's probably lying about his education.

Good. I trust people who lie about their education and exceed the expected ability of a BA, vs. the huge amounts of barely articulate and completely disinterested CS graduates entering the workforce as of late
I received them by mail. He didn't?
actually he recently received his BA and MA now.
I mean, you get a nose for people like that after a while. They all have common characteristics like self aggrandizement, making bold claims and then changing tune when caught, a history of burnt out and salty ex co workers. It’s not that hard to spot in this guy’s case.
Did you read the article?

I saw everyone repeating this over and over but then I actually read the article and couldn't even understand what the big deal was... hard to find a founder who hasn't done all the things he was accused off none of which were really a big deal.

Felt completely overblown and honestly like a weird hit piece but where the journo didn't actually find any real dirt to smear.

Certainly a shady player. Doesn't seem to be in good hands.
FWIW, they lost my interest entirely when they launched SD 2.0 and the model was worse, presumably (this was what people at the time were coming up with as the core problem) as they went overboard in their attempts at preventing it from generating naked photos (by removing so much from the training set that it no longer seemed to understand as well what a human looked like). The wider community then just got stuck on SD 1.5 and so frankly it isn't clear to me if the work Stability was doing was even relevant anymore... even just a few weeks ago I saw some new thing using SD, said to myself "god I hope it is using 1.5", and it was in fact (thankfully) still using 1.5.
SDXL is amazing.

The community is entrechend in 1.5 because that's what everyone is now familiar with, IMO

>The community is entrechend in 1.5 because that's what everyone is now familiar with, IMO

That probably has some weight to the community's decision to still use 1.5. Other reasons (and more important IMO) why we're still stuck on 1.5 is due to nerfing 2.0, and the plethora of user trained models based on 1.5.

I'm continued to be amazed by the quality possible with 1.5. While there are pros and cons of each of the different offerings provided by other image generators, I haven't seen anything available to the public that can compete with the quality gens a competent SD prompter can produce yet.

SDXL seems to have taken off better than 2.0, but nothing so amazing to justify leaving all the 1.5 models behind.

Well, personally, SDXL just blows 1.5 out of the water for me. I haven't had a reason to even touch 1.5 in months.

But note that SDXL is really awful in automatic1111 or vanilla HF diffusers for me. You have to use something with proper augmentations (like ComfyUI or Fooocus(which runs on ComfyUI)).

>You have to use something with proper augmentations (like ComfyUI or Fooocus(which runs on ComfyUI))

Yeah, comfy was given a reference design of the sdxl model beforehand so it would be supported when sdxl was released. I should probably switch to comfy, but I don't touch the tech very frequently as I don't have a practical use case besides the coolness factor.

Ok, I'll try SDXL? But, I continue to believe that it was the botched release and attempts to push people with SD 2.x that led to whatever is being talked about in this thread for why Stability gets "so little support from the community": I lost interest in what they were working on well before they released SDXL, as I was no longer convinced that their newer stuff would be better than their older stuff due to 2.x.

FWIW, "everyone had gotten so used to 1.5 that they just didn't want to bother with 2.x" might provide a similar mechanism, if a very different place for the blame: if people aren't paying attention to the new stuff you are building, it is going to hurt your "support".