Hacker News new | ask | show | jobs
Scaling deep learning to 10,000 cores and beyond (cs.washington.edu)
38 points by gjenks 4851 days ago
3 comments

Link to talk given by the host on the subject. http://vimeo.com/52332329
However, these methods have been fundamentally limited by our computational abilities, and typically applied to small-sized problems.

Is this true? I think these kind of networks are more limited by our abilities to generate effective heuristics and ontologies. When I populate my Markov models I need states: and if I don't have any good, domain specific states, no amount of expectation matching will solve my problems. The more incorrect states the more noise I get, so it is immediately clear that simply increasing computing power is a no-go.

The idea of deep learning is to eliminate the need for domain experts to write heuristics, ontologies, feature detectors, etc. In deep learning you feed the learning system raw data and it automatically creates feature detectors to model the data. Then once you have a good set of features you can train the system to perform a specific task using those features.

As the network gets bigger and deeper the feature detectors become more abstract and capable of higher-level tasks. For example, given a bunch of images, a small network might learn to distinguish straight lines from curvy lines, while a large network might learn to distinguish humans from cats.

So if deep learning actually works, then the main constraint on the capabilities of the learning system is compute power, not the cleverness of the domain experts writing your feature detectors. The main problem becomes scaling.

It also relies on having a lot of data (unlabeled is fine for learning features).

By contrast, manually chosen heuristics assumes you've seen a lot of data and you're bootstrapping the model with features deduced by your biological brain.

When you have a hammer with 10,000 cores, every problem looks like it has been fundamentally limited by our computational abilities.
These networks are being trained on unlabeled data. The premise of training these kinds of deep belief networks is that they can learn surprising amounts of information completely unsupervised.
The promise of deep learning is that you will not have to do feature engineering. You give it raw features, like the pixels in the image, and then it discovers the latent features as you train it.

So in a very real sense, yes we are limited by computational abilities. Humans take in millions (billions?) of raw sensory inputs constantly. Scaling current ML methods to that level is a serious computational challenge.

Depending on what the RAM/IO requirements are, that could be done with as little as $1400 worth of GA144 cores (70*144 = 10080).