|
|
|
|
|
by Davidiusdadi
89 days ago
|
|
Thanks for creating this. Would be interesting to know how much a jj-specific SKILL.md would race the score. Maybe this does not fit the evaluation framework but I'd still be interested in your experience / setup with e.g. a terminal based coding agent such as claude code. |
|
That is definitely something we're interested in; we will try running this evaluation with skills soon.
> This might not fit the evaluation framework, but I'd still be interested in your experience/setup with terminal-based coding agents like Claude Code.
We have adopted Harbor as our evaluation framework, so evaluating Claude Code is straightforward: https://harborframework.com/docs/agents#installed-agents