Hacker News new | ask | show | jobs
by byyoung3 301 days ago
this seems to disagree with a lot of research showing RL is not necessary for reasoning -- im not sure about alignment