Hacker News new | ask | show | jobs
by refulgentis 1176 days ago
I've been on leave from work and hammering the GPT APIs since GPT 3.5/ChatGPT was made available.

The local LLM stuff was a tad out of control from the drop, too many people hand-waving about how they could get the 7B running on a phone with quantization, but it was unintelligible, and not "no-RLHF" unintelligible. Just FUBAR'd.

I tried the latest round of RLHF'd models yesterday, and I'm officially publicly a skeptic now. These are an awful idea, training on ShareGPT gets horrible results: I'm seeing it emit the same exact answers ChatGPT does, but only a small fraction of them.

I understand that it itself impressive for a certain crowd, and I cede it's an accomplishment. However, it's an accomplishment that enables no further accomplishment: using a stolen model to do minimal RLHF that is really just overfitting on a subset of answers from another AI. That's not RLHF at all. If it was, RLHF isn't something you do in a weekend for $100, and pretty much everyone outside OpenAI and Anthropic are learning that.

2 comments

In my experience, the smaller models are almost completely worthless as-is. 65B is the only decent one (I'd say just behind gpt-3.5-turbo, and obviously it's not instruction tuned but I mean the coherency of the core language model), and understandably people aren't really paying attention or devoting much resources to the largest one. 30B shows promise for specific tasks with fine tuning, but 7B and 13B are just toys.
How would you judge Open Assistant's approach?
I don't know much about it specifically but, heartily endorse.

LAION was instrumental in early-ish AI art. I will always cherish & remember when you had like 14 people in an IRC room just playing around, fall/winter 2020. Now 3 of them have companies around it, and the resources that were there to enable ex. SD are similarly interested in LLMs.

This is excellent: open source is the way forward, just, needs to be more coordination, expertise, and patience involved. SNR ratio is way too low in general public spaces like HN right now.

(I'm being exclusively negative, in recompense: we're at year 1 of 100, the people wasting a ton of time replicating a proof of concept crappy "RLHF" run and rushing to post are learning too. The eternal golden summer starts now and anything anyone is doing is helpful)