| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cjbprime 1119 days ago
	Your understanding of alignment is somewhat out of date. Training a model to produce human-valued responses and training a model not to decide to destroy all the humans are not separate problems. RLHF may actually be an excellent solution to many of the problems you care about for today's LLMs, even though it is done for a practical reason (we want LLMs that will answer our questions with useful answers) instead of an existential risk reason.