Hacker News new | ask | show | jobs
by Davidiusdadi 89 days ago
Thanks for creating this.

Would be interesting to know how much a jj-specific SKILL.md would race the score.

Maybe this does not fit the evaluation framework but I'd still be interested in your experience / setup with e.g. a terminal based coding agent such as claude code.

1 comments

> Would be interesting to know how much a jj-specific SKILL.md would race the score.

That is definitely something we're interested in; we will try running this evaluation with skills soon.

> This might not fit the evaluation framework, but I'd still be interested in your experience/setup with terminal-based coding agents like Claude Code.

We have adopted Harbor as our evaluation framework, so evaluating Claude Code is straightforward: https://harborframework.com/docs/agents#installed-agents