| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sebzim4500 1208 days ago
	Why does there need to be a way out? Everyone just seems to assume that feeding model output into the training set is going to break things, but I don't get why. AlphaZero learned to play chess and go training purely on its own data. Why is inserting the best outputs from GPT-4 into the training set for GPT-5 expected to make things worse? To me, it sounds like it could even be desirable.

3 comments

marginalia_nu 1208 days ago

In chess there is a very clear victory state, and a scoring function can be implicitly defined from a large number of games of various skill levels.

You really don't throw two sentences into the thunder dome to decide which one "wins". Means it's much more susceptible to being poisoned.

link

sebzim4500 1208 days ago

>You really don't throw two sentences into the thunder dome to decide which one "wins".

That's almost literally what RLHF is though, and that is the last step of training GPT-n. Then when GPT-{n+1} is being trained, it will include some results from GPT-n, and therefore will benefit from that finetuning, even before it goes through its own round of RLHF. Also, on average good outputs of GPT-n are more likely to be included in the training set of GPT-{n+1} (because it ends up as a buzzfeed article or a top post on reddit or something), so there is an additional signal beyond the above.

link

simonh 1208 days ago

I suspect the comment about the thunder dome was a reference to RLHF. On the one hand RLHF seems far superior to the kind of prompt engineering Microsoft seems to have relied on with Sydney. On the other, it's dubious that the manual selection in RLHF is really always selecting for quality, as against at least to some significant extent pandering to whatever biases or preferences the humans in the training loop might have.

link

Y_Y 1208 days ago

That not what RLHF is. In the thunderdome, as in chess, you don't need human judges or an oracle to know who's won. That makes a significant difference to the training procedure.

link

geraneum 1208 days ago

That’s correct. I have seen the above argument a lot: Using analogy as a basis for proof!

link

simonh 1208 days ago

> Why is inserting the best outputs from GPT-4 into the training set for GPT-5 expected to make things worse?

Firstly what makes you think only the best output from 4 will go into future training sets? It’s just as likely to be the most bizarre or ludicrous, or dangerous that gets shared and discussed.

But also, how will v5 get to be better than v4 if it’s trained significantly on v4 output? It would just end up being trained to be the same, to have the same flaws and quirks reinforced.

We already know v4 just makes stuff up, it’s incredibly good at producing well formatted plausible looking but utterly factually incorrect output. That’s because it has no concept of truth or facts. All it knows about from the token sequence weightings is the form of language, not the content. Feeding that back into future models is the last thing we should be doing.

link

sebzim4500 1208 days ago

>Firstly what makes you think only the best output from 4 will go into future training sets? It’s just as likely to be the most bizarre or ludicrous

That's true now, because LLMs are new so the failure cases are still interesting. If we are talking about a hypothetical world in which LLM outputs are a significant portion of the internet, then most of it would be from reddit comments/tweets/HN posts/buzzfeed articles/etc.

Then if you take only the ones which have more than average views/upvotes/etc. you should expect to get the 'best' results.

link

simonh 1208 days ago

I'm still not convinced that's a reliable indicator of quality. It's potentially a measure of popularity or entertainment value, or maybe pandering to preconceptions but that's not at all the same thing.

Ask yourself, what are your from-scratch metrics for quality that you would like to select for. Then consider what are the likely or possible criteria people actually have for upvoting stuff on reddit. I'll think you'll find there is probably very little correlation between those. This is called the alignment problem and it's very hard to get right.

link

rnosov 1208 days ago

Correct output will be desirable. If you feed nonsense either human or AI generated you might break it.

link

JieJie 1208 days ago

Then we should encourage labeled ChatGPT content like ShareGPT, which can be easily avoided in future datasets because it is clearly labeled as AI-generated content.

It's the stuff that isn't labeled as generated with ChatGPT, et al, that will enter future training sets. I personally believe that's taking the "lossy JPEG" analogy too far, but I'm not an AI researcher.

link