Hacker News new | ask | show | jobs
by bbctol 3533 days ago
I think the issue here is that we're not discovering new architectures as one discovers particles, we're creating them. The creative space is infinite, and people are making subtle tweaks to neural network systems all the time. It's not science, it's engineering.

Right now, we're in the early stages of engineering, like architecture before modern physics: we know some things that work, and we have some good intuitions about why, but there's little solid foundation to tell us how to proceed. We take some loose inspiration from nature (replace tanh functions with rectifiers to mimic action potentials in the brain, build convolutional networks similar to the retina) and find that it's more effective, sometimes, and sometimes less. We also just try a lot of stuff in hopes that something will stick. It's not as if there are some real, true neural networks out there, waiting like particles to be discovered: everything in the neural network zoo was built by hand, maybe inspired by nature, and saved because it works well, or is at least interesting; other architectures are forgotten. What we'd like is engineering principles that we can understand, so trying to make a neural network better at function x is just a matter of adding more units here or editing a function there, not venturing out into the dark again. (Such a reductive set of explanations may not exist for cognition, which really worries people who liked computers for their predictability.)

1 comments

Another issue with attempting to unify existing results is the focus on good performance, and the higher-level optimisation being performed by the researchers/implementors. This is partly because of the focus on engineering, as you say; I'd wager it's also due to a 'file drawer effect', where the emphasis is on achieving ever-higher benchmark scores, and that rewards tweaking of algorithms.

I suppose the alternative, more scientific/less engineering approach would be to treat benchmark scores as experimental observations, and try to form predictive models which take in descriptions of networks and output predicted benchmark scores. In the architecture analogy, this would be like modelling the strength of various materials and shapes. If good predictive models are found, they can be used to design networks which are predicted to have desirable scores, in the same way that buildings can be designed based on predictions of how the materials and geometry will behave.

Of course, to be more useful we'd also want to take into account things like resource usage, training time, etc. and the models themselves must be constrained somehow, to avoid trivial solutions like "run the given network and see how it behaves, give that as our prediction".