Hacker News new | ask | show | jobs
DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL (pretty-radio-b75.notion.site)
19 points by mluo 498 days ago