Hacker News new | ask | show | jobs
Chain-of-Thought Reasoning Is a Policy Improvement Operator (arxiv.org)
2 points by hughzhang 940 days ago