Hacker News new | ask | show | jobs
by ahmedhawas123 325 days ago
Exciting as this is to toy around with...

Perhaps I missed it somewhere, but I find it frustrating that, unlike most other open weight models and despite this being an open release, OpenAI has chosen to provide pretty minimal transparency regarding model architecture and training. It's become the norm for LLama, Deepseek, Qwenn, Mistral and others to provide a pretty detailed write up on the model which allows researchers to advance and compare notes.

2 comments

Their model card [0] has some information. It is quite a standard architecture though; it's always been that their alpha is in their internal training stack.

[0] https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7...

This is super helpful and I had not seen it, thanks so much for sharing! And I hear you on training being an alpha, at the size of the model I wonder how much of this is distillation and using o3/o4 data.
The model files contain an exact description of the architecture of the network, there isn't anything novel.

Given these new models are closer to the SOTA than they are to competing open models, this suggests that the 'secret sauce' at OpenAI is primarily about training rather than model architecture.

Hence why they won't talk about the training.