If you want to test it across coding tasks, have a look at https://github.com/adam-s/testing-claude-agent