| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lllllm 1217 days ago
	The current systems like chatGPT actually have just such two parts. One is the raw LLM as you describe. The second one is another network acting as a filter on top of the first one. To be more precise, that second part is the process of finetuning with Reinforcement Learning from Human Feedback (RLHF). It trains a reward model to say if the first one was good or bad. Currently it's done very similarly to standard supervised learning (with human labelling) to say if the first model behaved good or bad, aligned or not with 'our' values. Anyway, while I remain sceptical about the roles of these in-flesh hemispheres, the artificial chatGPT-like systems indeed do have such left and right parts