Hacker News new | ask | show | jobs
by enum 491 days ago
The problems are not important, but they illustrate failures that are. For example:

- The paper has an example where the model reasons "I'm frustrated" and then produces an answer that it "knows is wrong". You wouldn't know it if you didn't examine the reasoning tokens.

- There are two examples were R1 often gets stuck "thinking forever"

If these failures happen on these questions, where else can happen? We'll start to find out soon enough.