Show HN: Open Source Reinforcement Fine-Tuning for Your Agents

Hi HN, we are building a reinforcement-learning fine-tuning service for LLMs.

As we know, agents fail all the time. Especially when you try to use them for something actually useful. Current solution approaches suck; prompting has intrinsic limits and supervised fine-tuning requires big explicit datasets that are hard to collect.

So we built a platform to solve that. With Reinforcement Learning/GRPO. Inspired by DeepSeek R1.

You let us intercept your agent's data flow, and we deliver you a fine-tuned open-source model, that is trained on the agent's specific task.

Instead of providing big datasets of explicit fine-tuning samples, you provide a reward function, judging the model's outputs. Our Reinforcement Fine-tuning is very sample-efficient, so you get significantly better results with fewer training steps.

The current paradigm is best suited for “verifiable domains”, like teaching models to reason, write code, use tools & web, play chess, etc. We are working on natively supporting training on MCP protocols (to make the agents actually learn to use the MCP tools), codebases (to understand every edge case of your codebase), and browser agent frameworks.

Next, we will also support an “alignment mode”, where you don’t have to provide a reward function, but provide high-level feedback on past failure runs of your agent.

It is basically the open-source version of OpenAI’s reinforcement fine-tuning API, however deeply integrated into your agent’s stack.

Depending on demand, we consider also making the fine-tuned models downloadable in the future.

We give the first 50 signups $20 in free training credits, so you can try it out. We would love to hear your feedback!