Hacker News new | ask | show | jobs
by Kinrany 489 days ago
The true test would be seeing the behavior change depending on the presence of reasoning
2 comments

The words thinking and reasoning used here are imprecise. It’s just generating text like always. If the text is after “ai-thoughts:” then it’s “thinking” and if it’s after “ai-response” then it’s “responding” not “thinking” but it is always a big ole model choosing the most likely next token potentially with some random sampling
That is what was observed - o1 family models performed the “cheat”, non-reasoning models didn’t.