| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by smonn_ 1149 days ago
	There's plenty of interesting neural network designs out there but they're being overshadowed by transformers due to their recent success. I personally thing that the main reason transformers work so well is because they actually step away from the multi layer perceptron stuff and introduce some structure and in a way sparsity.

3 comments

mumblemumble 1149 days ago

Also, multi-head attention strikes me as being about as close to how language semantics seems to actually work in human brains as I've seen.

Lots of caveats there, of course. First off, I don't know much about the neurology, I just have an amateur interest in second language acquisition research that sometimes brings me into contact with this sort of thing. On the ANN side, which is closer to my actual wheelhouse, we definitely don't actually have any way of knowing if the actual mechanism is all that close, and I'm guessing it probably isn't even close since ANN's don't actually work that similarly to brains. Nor does it need to be, but, intuitively, there's still something promising about an ANN architecture that's vaguely capable of mimicking the behavior of modules in an existing system (human brains) that's well known to be capable of doing the job. I'm not super wild about the bidirectional recurrent layers, either, because they impose some restrictions that clearly aren't great, such as the hard limit on input size. et cetera. But it still strikes me as another big step in a good direction.

link

smonn_ 1149 days ago

I'm currently working on a variation of a spiking neural network that learns by making and purging connections between neurons, which so far has been pretty interesting, though I am having a hard time getting it to output anything more than just the patterns it recognised. I did play around with adding its outputs to the input list, making it sort of recurrent but its practically impossible to decode anything thats going on inside of the network. Im thinking of tracking the inputs around to see what its doing right now, might be interesting to see it generate some sort of tree-like structure.

link

kyllo 1149 days ago

Are you familiar with the edge popup algorithm introduced in "What's Hidden in a Randomly Weighted Neural Network?" https://arxiv.org/abs/1911.13299v2

Seems relevant to what you're working on. It starts with a randomly initialized, overparameterized neural net, but instead of gradient descent backpropagation, it learns by deleting connection edges.

link

smonn_ 1149 days ago

I haven't read it, thanks a lot! I'm probably going to use it in an essay I'm writing about the topic.

link

trashtester 1149 days ago

That's probably true for most kinds of NN architectures, including convolutional layers and older recurring architectures (LSTM, etc). Fully connected networks do not seem to be a necessary and certainly not efficient way to represent the mechanisms that operate in the "real world", so clever way to make the networks sparse is an important key.

But it's equally important to create architectures that allow efficient backpropagation of errors.

It does seem like transformers are pretty good at both, already.

I kind of hope we're not getting much something radically better anytime soon, because it seems like AGI is already approaching faster than we can prepare for.

Then again, I would expect that someone somewhere is already using transformer based networks to develop some brand new architecture that does in fact provide such a leap.

link

neurobama 1149 days ago

>There's plenty of interesting neural network designs out there

Where could a person learn more about these?

link

uoaei 1149 days ago

It's less about enumerating the architectures that have been tried before, and more about recognizing the modularity of NN components and the different perspectives on what those modules might represent.

link