| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by CGamesPlay 286 days ago
	I think this is on the right track, but I think it's a byproduct of the reinforcement learning, rather than something hard-coded. Basically, the model has to train itself to follow the user's instruction, so by starting a response with "You're absolutely right!", it puts the model into the thought pattern of doing whatever the user said.

1 comments

layer8 286 days ago

"Thought pattern" might be overstating it. The fact that "You're absolutely right!" is statistically more likely to precede something consistent with the user's intent than something that isn't, might be enough of an explanation.

link