Hacker News new | ask | show | jobs
by bryan0 509 days ago
Yes but these were steps were not used in R1-zero where its reasoning capabilities were trained.
1 comments

And as a result R1-zero is way too crude to be used directly, which is a good indication that it remains relevant.