| HN Mirror

fair to call out but half true. i did send claude off to look up specific stats on failure modes (62% assertion correctness, etc), but the design decisions came from my own reading of anthropic's reports, the columbia daplab paper i cited, and a mix of matt pocock's lectures + my own anecdotal experience running this loop on real projects.