|
|
|
|
|
by wsxiaoys
82 days ago
|
|
> Would be interesting to know how much a jj-specific SKILL.md would race the score. That is definitely something we're interested in; we will try running this evaluation with skills soon. > This might not fit the evaluation framework, but I'd still be interested in your experience/setup with terminal-based coding agents like Claude Code. We have adopted Harbor as our evaluation framework, so evaluating Claude Code is straightforward: https://harborframework.com/docs/agents#installed-agents |
|