|
|
|
|
|
by solenoid0937
108 days ago
|
|
Sure the architecture is from 2017. But the gap between GPT-1 and frontier models today is not simply "more FLOPs" and as simple as "standing up PyTorch and vllm" - theres thousands of undocumented decisions about data, alignment, reward modeling, training stability, and inference-time strategies, and lots of tribal knowledge held by a small group of people who overwhelmingly do not want to work on weapons systems. The dense model argument is self-defeating long term. Sparsity (MoE etc.) lets you build a smarter model at the same compute budget, so going dense because you can afford to waste FLOPs is how you fall behind b/c you never came up with the step function improvements needed. Sure, the DoD invented HPC, but it also invented the internet, and then the private sector made it actually useful. |
|