Hacker News new | ask | show | jobs
by d0mine 5 days ago
As I understand RL makes foundation models stupider (less capable, not more) but better at following instructions.