| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ahmedhawas123 325 days ago
	Exciting as this is to toy around with... Perhaps I missed it somewhere, but I find it frustrating that, unlike most other open weight models and despite this being an open release, OpenAI has chosen to provide pretty minimal transparency regarding model architecture and training. It's become the norm for LLama, Deepseek, Qwenn, Mistral and others to provide a pretty detailed write up on the model which allows researchers to advance and compare notes.

2 comments

gundawar 325 days ago

Their model card [0] has some information. It is quite a standard architecture though; it's always been that their alpha is in their internal training stack.

[0] https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7...

link

ahmedhawas123 325 days ago

This is super helpful and I had not seen it, thanks so much for sharing! And I hear you on training being an alpha, at the size of the model I wonder how much of this is distillation and using o3/o4 data.

link

sebzim4500 325 days ago

The model files contain an exact description of the architecture of the network, there isn't anything novel.

Given these new models are closer to the SOTA than they are to competing open models, this suggests that the 'secret sauce' at OpenAI is primarily about training rather than model architecture.

Hence why they won't talk about the training.

link