Hacker News new | ask | show | jobs
by esperent 53 days ago
With no disrespect intended because this is also how I would do it (but I wouldn't publish and name it after myself!) - they didn't read the research. They had the AI that actually created this do that for them.
1 comments

fair to call out but half true. i did send claude off to look up specific stats on failure modes (62% assertion correctness, etc), but the design decisions came from my own reading of anthropic's reports, the columbia daplab paper i cited, and a mix of matt pocock's lectures + my own anecdotal experience running this loop on real projects.