Hacker News new | ask | show | jobs
by calebkaiser 2171 days ago
This is an odd framing.

Training has become much more accessible, due to a variety of things (ASICs, offerings from public clouds, innovations on the data science side). Comparing it to Moore's Law doesn't make any sense to me, though.

Moore's Law is an observation on the pace of increase of a tightly scoped thing, the number of transistors.

The cost of training a model is not a single "thing," it's a cumulative effect of many things, including things as fluid as cloud pricing.

Completely possible that I'm missing something obvious, though.

5 comments

> Comparing it to Moore's Law doesn't make any sense to me, though.

I assume it's meant as a qualitative comparison rather than a meaningful quantitative one. Sort of a (sub-)cultural touchstone to illustrate a point about which phase of development we're in.

With CPUs, during the phase of consistent year after year exponential growth, there were ripple effects on software. For example, for a while it was cost-prohibitive to run HTTPS for everything, then CPUs got faster and it wasn't anymore. So during that phase, you expected all kinds of things to keep changing.

If deep learning is in a similar phase, then whatever the numbers are, we can expect other things to keep changing as a result.

> then CPUs got faster and it wasn't anymore

The enabling tech was AES-NI instruction set, not the speed.

Agree on the rest. The main reason why modern CPUs and GPUs all have 16-bit floats is probably the deep learning trend.

If it hadn't been aes-ni, it would have been chacha, which is much faster than unaccelerated aes and close to the speed of accelerated aes.

Phones use https without a problem, and those haven't had hw-accelerated aes until recently.

A phone needing to set up a dozen HTTPS sockets is nothing for the CPU to do even without acceleration. A server needing to consistently set up hundreds of HTTPS sockets is where AES-NI and other accelerated crypto instructions becomes useful.
Like many things, Moore’s law is garbled when adopted by analogy outside its domain.

What does “more transistors” mean? To you, it means just what Gordon Moore means when he said it: opportunity for more function in same space/cost.

The laypersons, marketing grabbed the term and said it would imply “faster”. Which then was absurdly conflated with CPU clock speed (itself an important input, though hardly the only one, determining the actual speed of A system).

The use here is of the “garbled analogy” sort which surely is the dominant use today.

Yes but that aspect of Moore's law for CPUs expired over a decade ago. It's the whole reason we got multicore in the first place.
Even with multi-core, a CPU today is only 6x faster than a 10-year old CPU.
The difference might be even less. 4 Sandy Bridge cores (excluding memory controller and graphics) were not much bigger than the current 8 core Zen 2 die.

Certainly the peak performance you can put in a socket is much higher, but it's got more silicon in it than it used to.

Agreed, but Moore's Law has morphed to refer to both xtors and performance despite his original phrasing.

The biggest innovation I've seen is in the cloud: backplane I/O and memory is essential and up until a few years ago there weren't many cloud configurations suitable for massive amount of I/O.

Ok, but achieving Moore's law has required combining an enormous number of conceptually distinct technical insights. Both training costs and transistor density seem like well-defined single parameters that incorporate many small complicated effects.
Implicit in moore's law is that cost does not increase in the same way, if not prices of chips would also be doubling. Something more analogously sounding would be the decrease of cost per transistor.

The number of transistors is also not dependent on a single thing, it can be argued many macro events contributed since the 80s, the VC model for chipmakers in SV, the rise of the internet, going fabless, rise of mobile, innovations in fabrication technology.