Hacker News new | ask | show | jobs
by rahen 78 days ago
Yes. The Cray supercomputers from the 80s were crazy good matmul machines in particular. The quad-CPU Cray X-MP (1984) could sustain 800 MFLOPS to 1 GFLOPS, and with a 1 GB SSD, had enough computer power and bandwidth to train a 7-10M-parameter language model in about six months, and infer at 18-25 tok/sec.

A mid-90s Cray T3E could have handled GPT-2 124M, 24 years before OpenAI.

I also had a punch-card computer from 1965 learn XOR with backpropagation.

The hardware was never the bottleneck, the ideas were.

2 comments

Post-quantum crypto is a good example of this. Lattice-based schemes were theorized in the 90s, but they took decades to actually reach production. The math existed, the hardware existed, and the ideas for making it work were just not there yet.
The hardware was never the bottleneck, the ideas were.

For sure. Minsky and Papert really set us back.

They should have lived to see the results of the bitter lesson.
Minsky came close (d. 2016) -- although he may have had other interests later in life, if the Epstein file dumps are to be believed.