| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by toxik 572 days ago
	Why do you think that? InstructGPT was predominantly trained as a next-token predictor on whatever soup of data OpenAI curated at the time. The alignment signal (both RL part and the supervised prompt/answer pairs) are a tiny bit of the gradient.