|
|
|
|
|
by rahen
78 days ago
|
|
Yes. The Cray supercomputers from the 80s were crazy good matmul machines in particular. The quad-CPU Cray X-MP (1984) could sustain 800 MFLOPS to 1 GFLOPS, and with a 1 GB SSD, had enough computer power and bandwidth to train a 7-10M-parameter language model in about six months, and infer at 18-25 tok/sec. A mid-90s Cray T3E could have handled GPT-2 124M, 24 years before OpenAI. I also had a punch-card computer from 1965 learn XOR with backpropagation. The hardware was never the bottleneck, the ideas were. |
|