|
|
|
|
|
by mrothroc
105 days ago
|
|
Nice, I've been working on the same problem from a different direction. Instead of analyzing sessions after the fact, I built a pipeline that structures them. Stages (plan, design, code, review, same as you'd have with humans) with gates in between. The gates categorize issues into auto-fix or human-review. Auto-fix gets sent back to the coding agent, it re-reviews, and only the hard stuff makes it to me. That structure took me from about 73% first-pass acceptance to over 90%. What I've been focused on lately is figuring out which gates actually earn their keep and which ones overlap with each other. The session-level analytics you're building would be useful on top of this, I don't have great visibility into token usage or timing per stage right now. I wrote up the analysis:
https://michael.roth.rocks/research/543-hours/ I also open sourced my log analysis tools:
https://github.com/mrothroc/claude-code-log-analyzer |
|