| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by evanklem2004 57 days ago
	fair to call out but half true. i did send claude off to look up specific stats on failure modes (62% assertion correctness, etc), but the design decisions came from my own reading of anthropic's reports, the columbia daplab paper i cited, and a mix of matt pocock's lectures + my own anecdotal experience running this loop on real projects.