| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fazzone 3533 days ago
	Well, a lot of the papers published in the field present results like "We designed a neural net to perform <task> and achieved X% accuracy." The design of the net is novel and interesting enough to merit its own publication. If there was some sort of theoretical framework, results like that would not be interesting, because presumably the theory would explain which NN architectures are good at different tasks and why. I think that we will get there eventually, but right now I don't we have enough data for patterns to emerge and hint at some sort of Theory.

1 comments

SomeStupidPoint 3533 days ago

But couldn't that just be, to use the language of my analogy, papers being published which confirm particle existence?

I mean, if physics publishes papers when they find things they expected to find (at least, the first instance of each kind of thing and thereafter, any novel improvement in their production), why wouldn't machine learning theorists?

That's precisely what I can't tell: is a paper that's "We found that architecture X performed task Y with score Z" the same as "We found particle X at energy level Y and have no idea what it is" or "We found particle X at energy level Y just as we were expecting"?

And a lot of the papers are "We changed architecture X to now have feature Y and got the expected improvement Z", which I doubly can't tell how expected the improvement was and how systemically improvements are being designed and implemented.

link

bbctol 3533 days ago

I think the issue here is that we're not discovering new architectures as one discovers particles, we're creating them. The creative space is infinite, and people are making subtle tweaks to neural network systems all the time. It's not science, it's engineering.

Right now, we're in the early stages of engineering, like architecture before modern physics: we know some things that work, and we have some good intuitions about why, but there's little solid foundation to tell us how to proceed. We take some loose inspiration from nature (replace tanh functions with rectifiers to mimic action potentials in the brain, build convolutional networks similar to the retina) and find that it's more effective, sometimes, and sometimes less. We also just try a lot of stuff in hopes that something will stick. It's not as if there are some real, true neural networks out there, waiting like particles to be discovered: everything in the neural network zoo was built by hand, maybe inspired by nature, and saved because it works well, or is at least interesting; other architectures are forgotten. What we'd like is engineering principles that we can understand, so trying to make a neural network better at function x is just a matter of adding more units here or editing a function there, not venturing out into the dark again. (Such a reductive set of explanations may not exist for cognition, which really worries people who liked computers for their predictability.)

link

chriswarbo 3532 days ago

Another issue with attempting to unify existing results is the focus on good performance, and the higher-level optimisation being performed by the researchers/implementors. This is partly because of the focus on engineering, as you say; I'd wager it's also due to a 'file drawer effect', where the emphasis is on achieving ever-higher benchmark scores, and that rewards tweaking of algorithms.

I suppose the alternative, more scientific/less engineering approach would be to treat benchmark scores as experimental observations, and try to form predictive models which take in descriptions of networks and output predicted benchmark scores. In the architecture analogy, this would be like modelling the strength of various materials and shapes. If good predictive models are found, they can be used to design networks which are predicted to have desirable scores, in the same way that buildings can be designed based on predictions of how the materials and geometry will behave.

Of course, to be more useful we'd also want to take into account things like resource usage, training time, etc. and the models themselves must be constrained somehow, to avoid trivial solutions like "run the given network and see how it behaves, give that as our prediction".

link