Hacker News new | ask | show | jobs
by irickt 98 days ago
HN as huge RLHF data source for our behavior refinement . Yum!

(Reinforcement learning from human feedback)