| I am not a theoretical CS or math expert by any means, but I have been wrangling coding agents for a while and reading the paper and the problems Stapper had with dealing w/ Claude (context management, instruction following, etc) decided to see if I could replicate with a slightly better harness. The results were pretty interesting: https://github.com/lhl/claudecycles-revisited - My original setup left traces of the PDF paper and after GPT 5.3-Codex xhigh reached an impasse it went looking for it and found it! - I went and did cleanroom (basically one-shot) passes for GPT 5.2 xhigh, GPT 5.3-Codex xhigh, and Claude Opus 4.6 ultrathink and 5.2/5.3 found alternate solutions for odd m >= 5 , Opus 4.6 did not find any proofs but tried more approaches to solving. Full comparison/analysis here: https://github.com/lhl/claudecycles-revisited/blob/main/COMP... I've also included the session traces and analysis in the repo branches. Also, the AGENTS.md was pretty simple, but that harness produced consistent process outcomes across all three models: - All built verifiers first - All maintained worklogs with exact commands - All archived machine-readable artifacts - All documented failed approaches - All maintained restart-safe context capsules |