| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by reedlaw 83 days ago

I'm skeptical skills will outperform training given that Opus 4.7 already ignores a 720-byte CLAUDE.md telling it to use tidewave (a Rails MCP server with 6 tools) for db queries. When I asked a new claude session about a record it called

> Bash(DATABASE_URL=$(grep -E '^DATABASE_URL=' .env 2>/dev/null | head -1) echo "ok")

even though I have in CLAUDE.md:

> For database queries, use tidewave first.

I then prompted:

> use tidewave as per CLAUDE.md. also diagnose why you failed to heed that

> ● Diagnosis first: I defaulted to shell habits (env grep → psql) instead of pausing to recall the CLAUDE.md rule that tidewave is the first-line DB tool. The trigger was "look at this record" — I should have read that as "run a SQL query" and reached for tidewave immediately.

If Opus 4.7 doesn't follow simple CLAUDE.md instructions, I'm not sure what benefits other markdown files could bring. I don't trust Opus's own explanation, but it could point to the fact that the system prompt for bash is much longer than CLAUDE.md with tidewave.

While LLM judging could be helpful, I think the tool-call assertions (https://github.com/darkrishabh/agent-skills-eval#what-you-ge...) may be the most useful thing in agent-skills-eval given that it's the only objective measure of compliance.

3 comments

cyanydeez 83 days ago

I've had minor success with chiding the clanker, after it ignores something, to "please revise AGENTS.md to never do <whatever stupid thing it did> to prevent future assistances from doing x."

So, atleast heuristically, it should know _why_ it ignored whatever and hopefully pulls the correct anti-matter context. It took about two reptitions of this to get it to use pg-promise instead of psql to do queries for me. I assume the longer the context goes on, the less likely any of priming works.

link

NamlchakKhandro 83 days ago

Your using Claude code. That's your problem.

Use a different harness

link

reedlaw 82 days ago

Codex is only slightly better, and that fluctuates so I switch back and forth.

Use a hook

I tried to create a hook that would detect when token usage was running out and write HANDOFF.md so I could switch to another agent and finish the current task. It never worked reliably. To make a hook for db queries, it would need to run before each bash call, check if it looks like a query, and then exit with a new prompt, e.g.: "Use tidewave's execute_sql_query for DB access". But then it could just ignore the prompt the same as CLAUDE.me. What if I really wanted to use bash for a specific task? The real issue is that prompts are not tightly coupled with capabilities. If we admit that, then skills are over hyped.

link

erispoe 76 days ago

Just block the bash pattern with a hook and nudge it towards the behavior you want.

link

rirze 83 days ago

It's hard to make hooks work here, since the default approach it's using is call the URL directly.

I think it's better to have a repo-level skill instead, titled something like "connecting_to_db.md" and demonstrate exactly how to connect. Codex has been pretty good at referring to skills but it depends on context at the end of the day.

link