| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by alex_metacraft 129 days ago

I think this experiment has a fundamental flaw in its comparison setup.

What they're comparing is: (A) a skill with a short description in the frontmatter, which the agent may or may not decide to invoke, vs. (B) a massive compressed index of documentation paths dumped directly into AGENTS.md, which is always in context.

This isn't really "AGENTS.md vs skills." It's "always-in-context with high token count vs. lazy-loaded with a decision point." Of course the always-in-context version wins — you're giving the model way more information upfront. The agent literally can't miss it. That's not a surprising finding, it's almost tautological.

The more interesting question they don't address: what did their skill descriptions actually look like? In my experience, the quality of the frontmatter description is the single biggest factor in whether a skill gets invoked. A vague "Documentation lookup skill" will get ignored. A specific "Use this when the user asks about API endpoints, authentication, rate limits, or SDK usage for the Vercel platform" will get picked up reliably.

If you wrote equally detailed compressed pointers in AGENTS.md and equally detailed descriptions in skill frontmatter, the gap would likely be much smaller. The real takeaway isn't "skills are worse" — it's "if you don't invest effort in writing good skill descriptions, the agent won't know when to use them."