Hacker News new | ask | show | jobs
by trashtester 635 days ago
There are many sources and hints out there, but here are some details from one of the devs at OpenAI:

https://x.com/_jasonwei/status/1834278706522849788

Notice that the CoT is trained via RL, meaning the CoT itself is a model (or part of the main model).

Also, RL means it's not limited to the original data the way traditional LLM's are. It implies that the CoT processes itself is trained based on it's own performance, meaning the steps of the CoT from previous runs are fed back into the training process as more data.