Hacker News new | ask | show | jobs
by KTibow 382 days ago
These days, they'll sometimes also RL on a task if it's easy to validate outputs and if it seems worth the effort.