|
|
|
|
|
by mark_l_watson
2361 days ago
|
|
I don’t know if this mega-chip will be successful, but I like the idea. Before I retired I managed a deep learning team that had a very cool internal product for running distributed TensorFlow. Now in retirement I get by with a single 1070 GPU for experiments - not bad but having something much cheaper, much more memory, and much faster would help so much. I tend to be optimistic, so take my prediction with a grain of salt: I bet within 7 or 8 years there will be an inexpensive device that will blow away what we have now. There are so many applications for much larger end to end models that will but pressure on the market for something much better than what we have now. BTW, the ability to efficiently run models on my new iPhone 11 Pro is impressive and I have to wonder if the market for super fast hardware for training models might match the smartphone market. For this to happen, we need a deep learning rules the world shift. BTW, off topic, but I don’t think deep learning gets us to AGI. |
|
Specifically gradient descent is a post hoc approach to network tuning, while human neural connections are reinforced simultaneously as they fire together. The post hoc approach restricts the scope of the latent representations a network learns because such representations must serve a specific purpose (descending the gradient), while the human mind works by generating representations spontaneously at multiple levels of abstraction without any specific or immediate purpose in mind.
I believe the brain's ability to spontaneously generate latent representations capable of interacting with one another in a shared latent space is functionally enabled by the paradigm of neurons 'firing and wiring' together. I also believe it is the brain's ability to spontaneously generate hierarchically abstract representations in a shared space that is the key to AGI. We must therefore move away from gradient descent.