Hacker News new | ask | show | jobs
by rosstaylor90 487 days ago
RL has more than two steps...
1 comments

Point is that reasoning is more about the conclusions. if your steps are wrong, your reasoning is wrong regardless of the conclusion. Poor reasoning is what could make an LLM conclude that 1 + 2 = 3 but what 2 + 1 = [some number other than 3]