| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by meow_mix 1191 days ago
	Reinforcement learning w/ human feedback. What u guys are describing is the alignment problem

1 comments

mistymountains 1191 days ago

That’s just a supervised fine tuning method to skew outputs favorably. I’m working with it on biologics modeling using laboratory feedback, actually. The underlying inference structure is not changed.

link