Hacker News new | ask | show | jobs
by meow_mix 1191 days ago
Reinforcement learning w/ human feedback. What u guys are describing is the alignment problem
1 comments

That’s just a supervised fine tuning method to skew outputs favorably. I’m working with it on biologics modeling using laboratory feedback, actually. The underlying inference structure is not changed.