Hacker News new | ask | show | jobs
by WiSaGaN 1098 days ago
I believe the issue was not a lack of computational power, but rather that people at the time didn't think large models with many parameters would effect meaningful change. This was even true three years ago, albeit on a different scale. As Ilya Sutskever expressed, people were not convinced there was still room to increase the scale. For the status quo to shift, two things could happen: a substantial reduction in computing costs, making large-scale experiments less a matter of conviction and more a matter of course; or the emergence of individuals with the resources and conviction to undertake larger experiments.
3 comments

is that really true? a modern high end GPU has more computing power than the top 20 supercomputers of the year 2000 added together
My favorite comparison for the accessibility of power is looking at a weird computer in the top 500 from a while back.

System X, in 2004 was the 7th most powerful computer in the world. It was 1100 PowerPC 970 Macs with 2200 cores and claimed an Rmax of 12k GFlops. https://www.top500.org/system/173736/

A M1 MacBook Air hits 900 Gflops ( https://news.ycombinator.com/item?id=26333369 ). A dozen MacBook Airs - about what you'd expect in a grade school computer lab - is more powerful than the 7th most powerful computer system in the world 2 decades ago.

The RTX 4090 GPU (a single PCI card) hits 82k GFlops in FP32
The reason I like the comparison (and the "here's this giant computer and now it fits on a card that you can get at Micro Center" is another reasonable comparison) is that it deals with likeish to likeish.

It was a Mac back then - 1100 of them, but it was a Mac. You could walk into a store and buy one... or two. They might have some issue with buying a thousand of them, but they were consumer commodity equipment - it was the rack mounted version of the PowerMac G5 if I read things correctly. You might have one of them in the media lab for a high school.

And now, it's a dozen M1 MacBook Airs (or Mac minis). Still a Mac. Still something you could walk into the store and buy. But now instead of "maybe there are 1000 of them in all the grade and high schools in the state" (though that would be stretching it), its "now this is an acceptably outfitted grade school computer lab."

No regular person was ever going to get proper fraction of the nodes of BlueGene from DOE (though it was running a PowerPC 440 2C instead... but 32,768 of them) or do anything with it if they were. https://en.wikipedia.org/wiki/IBM_Blue_Gene

Comparing "that massive thing" to "this card" is impressive - but the "that massive thing" is inconceivable to the average person.

Thus the "you could have gotten a fraction of System X at a store and used it" comparison.

I see your logic, but the apple hardware is super-expensive, a single Macbook is the same cost as a single RTX 4090 (not the MBA maybe, but definitely the MBP). So it's not that wide a stretch to say that the 4090 in a normal PC is also a fair comparison as a "widely available" computer.
Computers are undoubtedly more powerful now than they were in the 90s. Although computing capabilities of the 90s seem weak compared to today's standards, they were not so inadequate that we couldn't train and run a network comprising thousands of parameters. I vividly recall the early 2000s when I was in college. Neural networks were seen as a sort of "fringe" technology in a series of statistics courses. We were mostly shown examples with 6 or 12 neurons, and nobody mentioned the possibility of scaling up to hundreds of neurons. Around that time, we already had sophisticated games like The Elder Scrolls III. We could have easily scaled up the network size by at least an order of magnitude at home, not to mention the capabilities that big companies possessed at that time.
Around fifteen years ago my machine learning professor dismissed neural networks because training them is not a nice convex optimization problem.
> but rather that people at the time didn't think large models with many parameters would effect meaningful change. This was even true three years ago, albeit on a different scale.

I've also noticed this, and want to ask: who are these people? Do they not have (~80-billion-neuron) brains? (And that's neurons, with by most estimates thousands of synapses each; so you're actually talking on the order of tens to hundreds of trillions of neural network parameters before you reach parity with biological examples.)

In the early 2000's, it was believed that the topology of a neuron network was a major factor to get it to work well, and that throwing more neurons and computing power alone would not suffice. In a sense it was not wrong : convolutional nets were an early example of neuron network topology that enforced translation invariance while being parsimonious in tunable parameters.

An other factor was that SVM were all the rage back then, because they had nice math and fitted the computational resources of a contemporary workstation.

Did you post something nearly identical to this before? I feel like I read it before.
Are you referring to other threads? No. However, I wouldn't be surprised if other people developed similar beliefs following recent advances in large language models (LLMs). Of course, we wouldn't achieve GPT-4 level results using only technology available before 2020, but with sufficient data and computational power, we could have accomplished much more than what was generally believed to be possible in the machine learning field at the time.
I thought I had read almost those exact words before. I've been known to repeat myself on here before.

In fact I've been so nuanced that I've had people use something I've said to disagree with me and then I've had to point out that the original thing is also by me.

I've been unable to determine whether I'm actually influential or am just unknowingly expressing part of a generalized changing sentiment. Confidence is the first trapping of fools.