| HN Mirror

Thanks! Yes, absolutely. OpenAI already has a reinforcement learning fine-tuning API in closed beta. However, historically, they’ve always left significant room for integrations into users systems. E.g. in the current demo of their RL fine-tuning platform, you can only select predefined reward functions and must manually upload the query datasets. I think that's the reason why so many open-source supervised fine-tuning companies exist.

My long-term take is that the agent economy will be around a few labs providing (partially open-source) foundational models where you don’t want to be part of the competition, as this will be the AI equivalent of the high-frequency tradings arms race). And above that will sit an infrastructure layer, specializing these very models to the users domains. OpenAI/Anthropic/… RL finetuning will be a part of that infrastructure layer, but so will open-source-model alternatives like ours.