Y
Hacker News
new
|
ask
|
show
|
jobs
by
meow_mix
1191 days ago
Reinforcement learning w/ human feedback. What u guys are describing is the alignment problem
1 comments
mistymountains
1191 days ago
That’s just a supervised fine tuning method to skew outputs favorably. I’m working with it on biologics modeling using laboratory feedback, actually. The underlying inference structure is not changed.
link