Hacker News new | ask | show | jobs
by djeastm 60 days ago
I thought reinforcement learning with human feedback was meant to get that quantification of "taste"