| HN Mirror

I think that you have to be OpenAI (or X, Google or Anthropic) to be able to fine tune models of this scale through reinforcement learning at present.

Look at Tinker for an example of where things might be heading though (https://tinker-docs.thinkingmachines.ai/)

At present though, I get the sense that reinforcement learning at scale is the current battleground (and has been for most of 2025). But we also see over time, the general models adopt the skills taught to the specialized models. Look at how the learning that made codex-1 went into GPT5.