Hacker News new | ask | show | jobs
by melissalobos 1604 days ago
> Deep-learning models have become pervasive tools in science and engineering. However, their energy requirements now increasingly limit their scalability.[1]

They make this claim first, and cite one source. I haven't heard of this as an issue before. Is there anywhere else I could read more on this?

[1]https://arxiv.org/abs/2104.10350

5 comments

I don't have a specific reference but I'd say it's a common knowledge assertion based on the growth in the number of parameters in models over the last 10 years. There are lots of places where you can see how the number of parameters, especially in language and vision models, has increased, and find that the amount of training time quoted. Normally it's framed in terms of compute instead of energy.
Got me wondering how this compares with neural efficiency, realizing ofc that there's nothing really apples-to-apples here.

Training one of these big models takes 100kWh for 1e19 flops, so that's 100k Wh, 360M Ws, or 360MJ or 3.6 1e8J. 1e8Joules/1e19flops = 1e-11J/flop

Neurons take 1e-8J/spike.[1]

Math check appreciated :)

Does seem plausible to think of a single neuron spike (hodgkin-huxley cable model) being modeled with ~1k flops. Though I'm firmly of the opinion that nobody really knows how the brain works.. the neural spike activity could be pure epiphenomenon.. who knows!

[1] “Finally, the energy supply to a neuron by ATP is 8.31 × 10−9 J. Meanwhile, integrating the total power with respect to time we will get the consumed electric power, which is 8.75 × 10−9 J. This is more energy than the ATP supplied. The energy efficiency is 105.3%. This is an anomaly…” - 2017 Feb 16 Wang, Xu, Institute for Cognitive Neurodynamics, East China University of Science and Technology https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5337805/

The neural spike is definitely not an epiphenomenon. The action potential / neurotransmitter release / receptor activation process is understood and can be manipulated with electric probes.
Sorry, didn't mean it quite like that. It's clear neural spike activity exists as a physical process. I'm suggesting that spiking activity may be an epiphenomena more primary brain functions, i.e. information processing, consciousness, etc..

As far as I know, we're closest to showing information processing in the visual cortex (which is highly linear) and we're still a long way from knowing how it works at a neural level. But maybe someone here can update on this?

But much of the cortex is highly recurrent (non-linear) and the idea that it's doing something like sending bits between synapses, encoded in spike timing or something.. well, I think that's highly speculative and has plenty of problems. But even if so, that's just "information processing".

I'm personally a fan of electromagnetic theories of consciousness[], where the synaptic activity could be an epiphenomenon of supporting a stand EM field.

[]https://en.wikipedia.org/wiki/Electromagnetic_theories_of_co...

>But much of the cortex is highly recurrent (non-linear) and the idea that it's doing something like sending bits between synapses, encoded in spike timing or something.. well, I think that's highly speculative and has plenty of problems.

I am not sure how much is known about information processing, but it's clear that motor impulses and sensory information are encoded in the spikes. Higher spike frequency = stronger signal. Synapses are how signals are passed from neuron to neuron.

Ok, that's fair. That's i/o and yes, that's known to be highly linear by the time it gets to the efferent nerves, and makes sense it is before that as well. I think that still leaves the vast majority of the cortex using undefined mechanisms.
There's no need to hypothesize a wholly unique central nervous system signalling mechanism when, not only is the signalling mechanism of peripheral nerves understood, central nerves are observed doing the same thing.
For those who are curious, consciousness is an epiphenomenon (an emergenty property of brains), while neural spikes are just physics.

See more: https://en.wikipedia.org/wiki/Neural_correlates_of_conscious...

I think it would be better to say something like, "paranoia is an epiphenomenon," when nobody knows what consciousness is.
Note that that title ends in "ism," like "Calvinism," or "Evangelicalism."
Training a state of the art model typically involves keeping a very large computer around at near 100% power load. Roughly about 10MW.

The actual limits on DL models (and any simulation or optimization) are: power density and the speed of light, plus the maximum amount of power you can deliver to the area. The speed of light limits how long your cables can be while still doing collective reductions, and the power density limits how much compute power you can fit per unit volume. One could imagine a fully liquid cooled supercomputer at 100MW (located near a very reliable and large power source) with optical fiber interconnect, this would completely change the state of the art in large models overnight.

All true.

I cannot cite a source here, but it is generally believed that the actual effective GPU utilization in AI training clusters which are "100% utilized" is actually quite poor - 23%-26% - due to data movement, non-essential serial execution, and and scheduling issues. So at least for now there is low-hanging fruit to improve the performance of the capital expenses.

Long term, though, DL clusters are basically CAPEX and energy limited.

IMHO, for now, return on the investment is not really a limiting factor, but it will become one once the shine is off the field.

I think they may have provided fewer citations because it felt like a less controversial claim. I think the choice of words was just a bit awkward. To me, it seems like they were asserting that deep learning requires lots of computational resources which is common knowledge. In general, this translates to higher energy requirements.
It's more of an inference and practical thing. If you want to equip something with limited energy (e.g. a drone using a small battery) with the ability to use a neural network for inference, their system could use much less energy than the typical computational setup.