Hacker News new | ask | show | jobs
by furyofantares 118 days ago
If it was easy to write evals, I would come at it from that direction.

But since it's not, what I do to avoid working on AGENTS.md blind is I test it on whatever causes me to write it.

I have some prompt, the AI messes it up in some way that I think it shouldn't, maybe it's something I've seen it do before and I'm sick of it. So I update AGENTS.md, revert the changes, /undo in the chat context and re-submit the same prompt.

1 comments

Tessl can generate the evals, both to test anthropic best practices as well as running scenarios with and without the skill to check if it's helping