Hacker News new | ask | show | jobs
by rawrmaan 1138 days ago
There was a lot of detail and data in here, but it's not very useful to me because all of the comparisons are to things I have no experience with.

There's really only one thing I care about: How does this compare to GPT-4?

I have no use for models that aren't at that level. Even though this almost definitely isn't at that level, it's hard to know how close or far it is from the data presented.

5 comments

None of the 3B and 7B models are at ChatGPT’s level, let alone GPT-4. The 13B models start doing really interesting things, but you don’t get near ChatGPT results until you move up to the best 30B and 65B models, which require beefier hardware. Nothing out there right now approximates GPT-4.

The big story here for me is that the difference in training set is what makes the difference in quality. There is no secret sauce, the open source architectures do well, provided you give them a large and diverse enough training set. That would mean it is just a matter of pooling resources to train really capable open source models. That makes what RedPajama is doing, compiling the best open dataset, very important for the future of high quality open source LLM’s.

If you want to play around with this yourself you can install oobabooga and figure out what model fits your hardware from the locallama reddit wiki. The llama.cpp 7B and 13B models can be run on CPU if you have enough RAM. I’ve had lots of fun talking to 7B and 13B alpaca and vicuna models running locally.

https://www.reddit.com/r/LocalLLaMA/wiki/models/

LLaVA 13B is a great multimodal model that has first class support in oobabooga too.

It's really fun to enable both the whisper extension and the TTS extension and have two-way voice chats with your computer while being able to send it pictures as well. Truly mind bending.

Quantized 30B models run at acceptable speeds on decent hardware and are pretty capable. It's my understanding that the open source community is iterating extremely fast on small model sizes getting the most out of them by pushing the data quality higher and higher, and then they plan to scale up to at least 30B parameter models.

I really can't wait to see the results of that process. In the end you're going to have a 30B model that's totally uncensored and is a mix of Wizard + Vicuna. It's going to be a veryyyy capable model.

I usually even prefer GPT-3.5, as it's faster and much cheaper. GPT-4 is great for the hardcore logical reasoning, but when I want something that knows to turn my lights on and turn the TV to a channel, it's overkill.
> The llama.cpp 7B and 13B models can be run on CPU if you have enough RAM.

Bigger ones as well, you just have to wait longer. Nothing for real time usage, but if you can wait 10-20 minutes, you can use them on CPU.

It's not even that bad. Core i7-12700K with DDR5 gives me ~1 word per second on llama-30b - that is fast enough for real-time chat, with some patience. And things are even better on M1/M2 Macs.
The critical factor seems to be the ability to fit the whole model in RAM (--mlock option in oobabooga). With Apple's RAM prices most M1/M2 owners probably don't have the 32 GB RAM required to fit a 4bit 30B model.
I have 64 GB RAM, but only a Ryzen 5 3600, and the larger models are very slow ;)
Do these red pajama models work with llama.cpp?
the naming is confusing... these models are aiming to equal or beat LLaMa by reproducing the trainign data and methodology that was used for LLaMa

But the actual model architecture is slightly different, based on Pythia

I guess what is needed is a pythia.cpp https://github.com/ggerganov/llama.cpp/issues/742

No, llama.cpp only works with llama-based models, like base llama, alpaca, vicuna, ...
The bit I liked best was the response examples. Look at those. Clearly not as good as GPT-4 but good enough I feel that for say a scenario where you care about privacy or data provenance this would be a contender.

For example a therapist, a search bot for you diary, a company intranet help bot. Anything where the prompt contains something you don’t want to send to a third party.

That's a great point, I definitely overlooked these. They look pretty good, too, and I agree with your use cases.

Thanks!

Then you probably don't care about this (yet)

Assume a truly competitive model in the Open Source world is still a ways off. These teams and their infrastructure are still in their early days while OpenAI is more at the fine-tuning and polishing stage. The fact that these open teams are able to have something in the same universe in terms of functionality this fast is pretty amazing... but it will take time before there's an artifact that will be a strong competitor.

The pace of the progress the open source models are making is pretty astonishing. The smaller model sizes are cheap to train so there is a lot of iteration by many different teams. People are also combining proven approaches together. Then they're going to nail it and scale it. Will be very interesting to see where we are in 3 months time.
There's a nice chart in the leaked Google memos that compares some of the open models against ChatGPT and Bard so you can get a sense where these models land by comparing them to these.

https://twitter.com/jelleprins/status/1654197282311491592

> How does this compare to GPT-4?

I'll give you the answer for every open source model over the next 2 years: It's far worse

If you'd said that about OpenAI's DALL-E 2 you'd have been wrong.

I suspect Open Source LLMs will outpace the release version of GPT-4 before the end of this year.

It's less likely they will outpace whatever version of GPT-4 is shipped later this year, but still very much possible.

Open source LLMs might do that, but I very much doubt that those models will be small enough to run even on high-end consumer hardware (like say RTX 3090 or 4090).
The way they'll do it, if they do it at all, is to find a way to squeeze the capability into smaller models and get much faster at executing them. That's where the market forces are.

That's exactly the core of the email that leaked out of Google: it's proving far better to be able to have lots of people iterating quickly (which necessarily means broad access to the necessary hardware) than to rely on massive models and bespoke hardware.

I'd anticipate something along the lines of a breakthrough in guided model shrinking, or some trick in partial model application that lets you radically reduce the number of calculations needed. Otherwise whatever happens isn't as likely to come out of the open source LLM community.

> it's proving far better to be able to have lots of people iterating quickly (which necessarily means broad access to the necessary hardware) than to rely on massive models and bespoke hardware

Very true, but can't Google just wait and take from the open-source-LLM community the findings, then quickly update their models on their huge clusters? It's not like they will lose the top position, already done that.

Yes and no. Some of the optimisation techniques that are being researched at the moment use the output of larger models to fine-tune smaller ones, and that sort of improvement can obviously only be one-way. Same with quantising a model beyond the point where the network is trainable. But anything that helps smaller models run faster without appealing to properties of a bigger model that has to already exist? Absolutely yes.
That seems way off the mark.

Open source models can already approximate GPT-3.5 for most tasks on common home hardware, right now.

Okay, so "ignore my out of touch opinion of language models". Got it.