The "magic" is done via the JSON schemas that are passed in along with the definition of the tool.
Structured Output APIs (inc. the Tool API) take the schema and build a Context-free Grammar, which is then used during generation to mask which tokens can be output.