| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mrothroc 105 days ago

Nice, I've been working on the same problem from a different direction. Instead of analyzing sessions after the fact, I built a pipeline that structures them. Stages (plan, design, code, review, same as you'd have with humans) with gates in between.

The gates categorize issues into auto-fix or human-review. Auto-fix gets sent back to the coding agent, it re-reviews, and only the hard stuff makes it to me. That structure took me from about 73% first-pass acceptance to over 90%.

What I've been focused on lately is figuring out which gates actually earn their keep and which ones overlap with each other. The session-level analytics you're building would be useful on top of this, I don't have great visibility into token usage or timing per stage right now.

I wrote up the analysis: https://michael.roth.rocks/research/543-hours/

I also open sourced my log analysis tools: https://github.com/mrothroc/claude-code-log-analyzer

1 comments

keks0r 105 days ago

This is great. How are you "identifying" these stages in the session? Or is it just different slash commands / skills per stage? If its something generic enough, maybe we can build the analysis into it, so it works for your use case. Otherwise feel free to fork the repo, and add your additional analysis. Let me know if you need help.

link

mrothroc 105 days ago

I use prompt templates, so in the first version of my analysis script on my own logs I looked for those. However, to make it generic, I switched to using gemini as a classifier. That's what's in the repo.

link