Hacker News new | ask | show | jobs
by scottlegrand 3693 days ago
It's more than that, and it's in use in production at Amazon. 8 TitanX GPUs can contain networks with up to 6 billion weights. As Geoffrey Hinton once said:

"My belief is that we’re not going to get human-level abilities until we have systems that have the same number of parameters in them as the brain."

And you're right that it's a specialized framework/engine. But IMO making it more general purpose is a matter of cutting and pasting the right cuDNN code or we can double down on emphasizing sparse data. Amazon OSSed this partially IMO to see what people would want here.

2 comments

> "My belief is that we’re not going to get human-level abilities until we have systems that have the same number of parameters in them as the brain."

An interesting quote.

Replicating functioning of the brain, or some major subsystem of it, is no doubt going to require far more than just billions of parameters. The cortex contains >15 billion neurons, but there are also the neurons contained in all the other brain structures. Furthermore, neurons connect via dense dendritic trees, the human brain having on the order of 100 trillion synapses.

Adding to the complexity, neurons have numerous "communication ports", including numerous pre- and postsynaptic neurotransmitter receptors, and a wide range of receptors for endocrine, immune system and other types of signals. Message propagation typically involves as well the layer of complex intracellular "second-messenger" transformations.

While it's highly probably future NNs will be developed that do even more amazing things than now possible, I think the challenge of equaling what real brains do is to say the least enormously daunting.

Somebody smarter than me could probably figure out the magnitude, how many nodes or weights it takes for a NN to function like the brain, though I imagine it will be a really impressive number.

Edit: typos

> "My belief is that we’re not going to get human-level abilities until we have systems that have the same number of parameters in them as the brain."

While that may be true, I find this compelling:

"The fundamental unit of biological information processing is the molecule, rather than any higher level structure like a neuron or a synapse; molecular level information processing evolved very early in the history of life."

http://www.softmachines.org/wordpress/?p=1558#more-1558

Edit: formatting

>Replicating functioning of the brain, or some major subsystem of it, is no doubt going to require far more than just billions of parameters.

Maybe, but we shouldn't forget that computers do not suddenly lose their capability to function as exact, deterministic, programmable machines just because they happen to run an ANN.

What I mean is that there may be shortcuts to reduce the number of required nodes dramatically.

If you take the state of an ANN after it was trained to perform some specific task, you can ask the question whether there is a simpler function, i.e. one with much fewer parameters, that approximates the learned function.

Sort of like a human with the Occam's razor gene. I think the fact that the number of neurons does not correlate perfectly with intelligence in animals is an indication that there is room for optimization.

Absolutely 100% agree, but at the same time, I think we will ultimately need to build and evaluate models that can span the memory of more than one processor. I don't think a single GTX Titan X, GTX 1080 or even a server is enough here.

Additionally, data parallelization and ASGD broadly disallow these larger models (yes I know about send/receive nodes in TensorFlow, but they're not general or automatic enough for researchers IMO) while ASGD makes horribly inefficient use of the very limited bandwidth between processors. All IMO of course. There are hacks and tricks here, but I think those should be late stage optimizations, not requirements to achieve scaling.

Finally, I'm a stickler for deterministic computation as someone who spent a decade writing graphics drivers before joining the CUDA team in 2006, but that's pretty much a "hear me now, believe me later" opinion of mine after tracking down too many bizarro race conditions late into the night in that former life :-). Of course, one person's race condition can sometimes be an ANN's regularizer, but I digress.

I also agree we'll do some amazing things with far fewer neurons and weights than an actual human brain, but I'll bet you good money we end up needing more than 12GB to do it. AlphaGo alone was 200+ GPUs, right?

Thanks for the clarification. I'd change "early proof of concept" to "a specialized framework", but the other observations stand, I believe.

It's totally fine that it's a specialized framework, and it doesn't need to become general purpose. I just think the product description should do a better job positioning it and explaining what it's NOT intended for to set expectations correctly.