| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by furyofantares 118 days ago

If it was easy to write evals, I would come at it from that direction.

But since it's not, what I do to avoid working on AGENTS.md blind is I test it on whatever causes me to write it.

I have some prompt, the AI messes it up in some way that I think it shouldn't, maybe it's something I've seen it do before and I'm sick of it. So I update AGENTS.md, revert the changes, /undo in the chat context and re-submit the same prompt.

1 comments

sjmaplesec 117 days ago

Tessl can generate the evals, both to test anthropic best practices as well as running scenarios with and without the skill to check if it's helping

link