| HN Mirror

The problem with neural nets is that they have a fixed input type - tensors or sequences. For example, imagine the task is to count objects in an image and say if the number of red objects is equal to the number of green objects. You make a net that solves this situation. Then you want to change the colors, or add an extra color, and it will fail. Why - because it learns a fixed input representation.

What neural nets need is to change their data format from plain tensors to object-relation graphs. The input of the network is represented as a set of objects that have relations among them, and the network has to be permute invariant to the order of presentation. An implementation is Graph Convolutional Nets. They learn to compose concepts in new ways and once they learned to count, compare, select by color, they can solve any combination of those concepts as well. That way the nets generalize better and transfer knowledge from a problem to the next.

Graphs are able to reduce the complexity of learning a neural net that can perform flexible tasks. But in order to get to even better results, it is necessary to add simulation to the mix. By equipping neural nets with simulators, we can simplify the learning problem (because the net doesn't have to learn the dynamics of the environment as well, just the task at hand). Examples of simulators used in DL are AlphaGo, the Reinforcement Learning applications on Atari Games, protein/drug property prediction, generative adversarial networks (in a way).

The interesting thing is that graphs are natural for simulation. They can represent objects as vertices and relations as edges, and by signal propagation the graph works like a circuit, a simulator, producing the answer. My bet is on graphs + simulators. That's how we get to the next level (abstraction and reasoning). DeepMind seems to be particularly focused on RL, games and recently, relation networks. There is also work on making dynamic routing in neural nets, in fact applying graphs implicitly inside the net, by multiple attention heads.