Hacker News new | ask | show | jobs
by solenoid0937 108 days ago
Sure the architecture is from 2017. But the gap between GPT-1 and frontier models today is not simply "more FLOPs" and as simple as "standing up PyTorch and vllm" - theres thousands of undocumented decisions about data, alignment, reward modeling, training stability, and inference-time strategies, and lots of tribal knowledge held by a small group of people who overwhelmingly do not want to work on weapons systems.

The dense model argument is self-defeating long term. Sparsity (MoE etc.) lets you build a smarter model at the same compute budget, so going dense because you can afford to waste FLOPs is how you fall behind b/c you never came up with the step function improvements needed.

Sure, the DoD invented HPC, but it also invented the internet, and then the private sector made it actually useful.