Hacker News new | ask | show | jobs
by lwneal 1202 days ago
The most incredible thing about this system is that it uses Stable Diffusion (the open source AI art generator), rather than DALL-E (the proprietary closed art generator owned by OpenAI).

The fact that even Microsoft, which partially owns OpenAI, is giving up on DALL-E shows the power of building an open-source community around models with published, downloadable weights.

5 comments

> even Microsoft, which partially owns OpenAI, is giving up on DALL-E

hold on to your wild extrapolations there. this is a paper by 6 people from Microsoft Research Asia, which seems based out of China. 6 researchoors publishing a thing independently does not mean Microsoft "giving up on DALL-E".

Yeah I agree that GP is hyperbole. Still though, at the very least it shows that the researchers found it easier to work with weights they could run locally rather than via another API call.

I assume this is because DALLE2 still doesn’t provide embeddings and/or finetuning via API. In addition to likely being more expensive to run.

Happy to be corrected on any of this - I still haven’t read the paper.

So why hasn't there been a "as popular as ChatGPT" open source version?
Two things stand out. Stable Diffusion; the weights for it and the supporting Python code was open sourced and released to the public. Anyone can find the cpkt file and the Python code online and download it and (with a bit of work) run it on their own computer if they have the hardware (any reasonable GPU) for it. There has been no such release by OpenAI, the closest we got is Facebook's leaked LLama model, and that's not a chat bot. So we don't have a model to run. Stability AI/Emad paid to train the model, which cost like a half million dollars in GPU time (he obviously didn't pay the retail price of $600k, but also it's not something you'd get right on the first shot either) and then gave the output from that away. It's not clear how much it would cost to train a comparable chat bot to ChatGPT but the impression is that would take much more.

The second thing is that it's not clear that we, the Internet at large would actually benefit from the model's release. StableDiffusion is 4 gigs and able to run on all sorts of consumer grade hardware, leading to such a Renaissance. ChatGPT makes liberal use of Nvidia A100 GPUs that are available to them to use as compute in Azure. (AWS and GCP, along with many AI focused smaller cloud companies also offer these.) One of those costs, like, $10k. And you need several of them to be able to run ChatGPT. Which means even if OpenAI were to live up to their name and release ChatGPT's model, only businesses and research labs would actually have the hardware to run it, so it would be awesome to have the weights, but you wouldn't have the same army of developers able to work on it.

There are open source LLM chat bots out there, so I think we will see one become popular, but at least that says why it'll be a second before we do.

I have been fooling around with the small 7B llama models. They chat, but they are pretty dumb compared to ChatGPT. This means they are terser and they confabulate more, even for things that are common knowledge. It seems, from asking it questions about current events, that the model was trained up to data from early 2020.

I haven't seen much output yet from the biggest 65B parameter llama model. One can rent cloud VMs that can run it for $1.25 an hour or so on vast.ai to run it, but ChatGPT is $20 a month so why bother, unless you like the fully uncensored aspect.

Likely training and running cost.

Most AI art generators are run on single GPU and can be trained with some top-of-the-line consumer hardware. Expensive but accessible.

A full blown LLM like ChatGPT is literally the cost of a small startup to build and trained. Running it is near impossible without cards like A100 which alone costs more than a full enthusiast grade PC.

Maybe eventually they will distill and optimize the models so that we can fit these things on a PC, then laptop, and then phones. But for now it is exclusively the domain of big tech.

> Likely training and running cost.

Why wasn't this a problem for StableDiffusion vs DALL-E?

They are smaller models with less parameters. Their original small sizes relative to LLMs also let people play around with it and tune it to run on less expensive hardware, if the weights are given, ie open source like SD.

Originally SD was quite hard to run, with an 8GB high end card only outputting 256x256 images. Then AMD and NVIDIA started releasing 16GB and 24GB consumer cards and people start doing training on those GPUs and tuning their own models. Now we have plenty of cards and models that can do 512x512.

> They are smaller models with less parameters.

I wouldn't have guessed image is a smaller model/easier to manipulate/generate than text.

Stable Diffusion will run on any decent gaming GPU or a modern MacBook, meanwhile LLMs comparable to GPT-3/ChatGPT have had pretty insane memory requirements - e.g., <https://github.com/facebookresearch/metaseq/issues/146>
Worth noting that the M-series macbooks are UMA so 100GB VRAM is costly but easily accessible. Their GPU performance is nowhere near a 96GB A100, but for sheer VRAM it’s a good choice.
DALL-E has given a new given a new "experimental" model to a few lucky users, it looks a bit better but it seems to have less variety currently. I don’t think they will catch back the competition. ControlNet is so good and I guess that MidJourney 5 and StableDiffusion 3 are going to be fully released before.
Millions of images have been removed from the training set for SD3 which is why a lot of people are sticking to 1.5.
1.5 is a lot more stupid than 2.1 in my experience.

Some models based on 1.5 produce good looking images if it produces what you asked for, but it's often a miss on more complex compositions.

We only start to see good 2.1 models like the Illuminati one. I have good hopes about the version 3, and I hope people will fine-tune it to their desires (that seems to mainly be young looking women with unrealistic bodies).

Not giving up sadly, recently they launched bing.com/create.
I also have a website that integrates Stable Diffusion with ChatGPT.

https://aidev.codes

Now everyone will go ahead and bury my comment.

You're getting a lot of downvotes and backlash because of the way that you're posting. I understand how you feel, though. It's very personal if you pour effort into something and people aren't interested. It plays with your sense of self-worth. It's a nasty feeling. It's easy to be bitter.

I'm not going to check out the site because I'm not interested in ML generation of websites, or even manual creation of them. I'm here because I'm interested in the discussion around LLMs. That doesn't mean it doesn't have value, though.

If you're going to follow this entrepreneurial path you need to develop a thick skin and learn to cope with rejection. You're going to get things wrong a lot, and a large amount of your effort will be "wasted" trying things that don't work out. You need to learn from your mistakes and understand your own strengths and weaknesses (eg. if you're not good at marketing, involve someone who is). If you want something where your efforts will reliably be rewarded you need to get a regular job instead.

I upvoted, fwiw.

I get an error when I try to log in - literally the response is the text "Error" and nothing else, not even HTML!
I know its not a useful message. Its one of the more obvious things to be improved. Sorry about that. I will see if I can quickly improve it.

OK I fixed the message -- that actually means invalid username.. whoops.. OR invalid password. Lol. What user is it?

It will say which error now at least.