Hacker News new | ask | show | jobs
by HarHarVeryFunny 2 hours ago
> The science behind these models are being worked on IN PUBLIC. The research is not secret. The implementations will all catch up.

Only to a limited extent - the US companies stopped sharing research a long time ago, other than Anthropic's interpretability research (which also seems to have dried up?). Interestingly most of the sharing is now coming from the Chinese side, largely DeepSeek. Ziphu/Z.ai (GLM) is also partner in the Slime RL training framework.

I wouldn't call much, if any, of this "science" - it's all empiricalism. Throw spaghetti at the wall and see what sticks. There's a famous quote from Noam Shazeer:

"We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence"

https://arxiv.org/abs/2002.05202v1

Jakob Uszkoreit has also talked about the empiricalism that it took to make what would become the Transformer, and any complex neural network architecture work.