It's basically just a way for the LLM to lazy-load curated information, tools, and scripts into context. The benefit of making it a "standard" is that future generations of LLMs will be trained on this pattern specifically, and will get quite good at it.
This question doesn’t have anything to do with skills per se, this is just about how different agents handle context. I think right now the main way they cull context is by culling noisy tool call output. Skills are basically saved prompts and shouldn’t be that long, so they would probably not be near the top of the list of things to cull.
Claude Code subagents keep their context windows separate from the main agent, sending back only the most relevant context based on the main agent's request.
It's the description that gets inserted into the context, and then if that sounds useful, the agent can opt to use the skill. I believe (but I'm not sure) that the agent chooses what context to pass into the subagent, which gets that context along with the skill's context (the stuff in the Markdown file and the rest of the files in the FS).
This may all be very wrong, though, as it's mostly conjecture from the little I've worked with skills.
Claude also has custom slash-commands, so you can force skill usage as you see fit.
This lets you trigger a skill with '/foo' in a way that resembles the way you'd use the command line.
Claude Code is very good at using well-defined skills without a command though, but in a scenario where this is some nuance between similar skills they are useful.
BUT what makes them powerful is that you can include code with the skill package.
Like I have a skill that uses a Go program to traverse the AST of a Go project to find different issues in it.
You COULD just prompt it but then the LLM would have to dig around using find and grep. Now it runs a single executable which outputs an LLM optimised clump of text for processing.
“inserted at the start of any query” feels like a bit of a misunderstanding to me. It plops the skill text into the context when it needs it or when you tell it to. It’s basically like pasting in text or telling it to read a file, except for the bit where it can decide on its own to do it. I’m not sure start, middle, or end of query is meaningful here.