| HN Mirror

Thanks! Our goal is to make rl "just work" with completely automated GPU provisioning/algorithm selection/SFT-warm up, but giving people the ability to switch away from the defaults if they want to.

The way tools currently work in the beta is you add tools via MCP to the configuration, and they get passed in as additional context for the model; the model might then choose to use a tool during inference; the tool is then automatically called and the output is returned as a tool message. If you really want to you could parse the tool output as part of reward calculation, but I expect you'd usually base the reward just on the model's completion. I could give more details if there's a specific tool setup you're envisioning!