I addressed this in my reply to kelseyfrog above. The short version: the production work is proprietary, the tooling I used to do the analysis is open source.
Yeah, it's "my code is lives in another (gas) town, you wouldn't know her". Same for some undisclosed opensource projects.
I can't imagine letting any LLMs do 500+ hours of autonomous work on any code at my company, or even for my own project (hundreds of thousands of lines of unreviewable slop? no thank you). Especially for the amount of features you claim they implement from scratch.
I also don't believe anything about "2 agents running for 12 hours" given how fast they exhaust context, become extremely stupid, and completely ignore most of previous work on subsequent runs, and will happily ignore any explicit instructions. Despite any "guardrails".
Funnily enough literally right now in my current session Claude has "forgotten" most instructions from its global memory *and* its local CLAUDE.md
I can't imagine letting any LLMs do 500+ hours of autonomous work on any code at my company, or even for my own project (hundreds of thousands of lines of unreviewable slop? no thank you). Especially for the amount of features you claim they implement from scratch.
I also don't believe anything about "2 agents running for 12 hours" given how fast they exhaust context, become extremely stupid, and completely ignore most of previous work on subsequent runs, and will happily ignore any explicit instructions. Despite any "guardrails".
Funnily enough literally right now in my current session Claude has "forgotten" most instructions from its global memory *and* its local CLAUDE.md