Hacker News new | ask | show | jobs
by hamiecod 186 days ago
Thats a strong RL technique that could equal the quality of RLHF.