Hacker News new | ask | show | jobs
by mumblemumble 1149 days ago
Also, multi-head attention strikes me as being about as close to how language semantics seems to actually work in human brains as I've seen.

Lots of caveats there, of course. First off, I don't know much about the neurology, I just have an amateur interest in second language acquisition research that sometimes brings me into contact with this sort of thing. On the ANN side, which is closer to my actual wheelhouse, we definitely don't actually have any way of knowing if the actual mechanism is all that close, and I'm guessing it probably isn't even close since ANN's don't actually work that similarly to brains. Nor does it need to be, but, intuitively, there's still something promising about an ANN architecture that's vaguely capable of mimicking the behavior of modules in an existing system (human brains) that's well known to be capable of doing the job. I'm not super wild about the bidirectional recurrent layers, either, because they impose some restrictions that clearly aren't great, such as the hard limit on input size. et cetera. But it still strikes me as another big step in a good direction.

1 comments

I'm currently working on a variation of a spiking neural network that learns by making and purging connections between neurons, which so far has been pretty interesting, though I am having a hard time getting it to output anything more than just the patterns it recognised. I did play around with adding its outputs to the input list, making it sort of recurrent but its practically impossible to decode anything thats going on inside of the network. Im thinking of tracking the inputs around to see what its doing right now, might be interesting to see it generate some sort of tree-like structure.
Are you familiar with the edge popup algorithm introduced in "What's Hidden in a Randomly Weighted Neural Network?" https://arxiv.org/abs/1911.13299v2

Seems relevant to what you're working on. It starts with a randomly initialized, overparameterized neural net, but instead of gradient descent backpropagation, it learns by deleting connection edges.

I haven't read it, thanks a lot! I'm probably going to use it in an essay I'm writing about the topic.