Hacker News new | ask | show | jobs
by optimalsolver 808 days ago
We can name these hypothetical objects Recursive Neural Networks.
3 comments

i know you're jesting but RNNs are recursive along the sequence length where I am describing recursion along the depth.
Recursive NNs are not the same as Recurrent NNs:

https://en.wikipedia.org/wiki/Recursive_neural_network

Well ish. The article above explains that Recursive-NNs are hierarchical whereas RNNs are linear. I guess the distinction is a little on the fine side.

Anyway carry on. Pedantic moment over.

The recursive neural networks described there are a failed academic project from more than a decade ago, predating modern deep learning. Basically everyone using the phrase recursive nn nowadays is probably just mispeaking for RNN. RNNs also are not linear
I don't know about "everybody nowadays" but I remember Recursive Neural Nets as an architecture introduced by Christopher Manning with the argument that it was better suited to the hierarchical structure of language than existing architectures. I did find it a bit of a bad choice of name, given that it's so closed to Recurrent Neural Nets. All this is from memory though I might check the internets later to see what I misremember.

RNNs are a large class of architectures of varying complexity, from Kallman Filters to LSTMs. It's not clear to me exactly what the wikipedia article means by "linear" but LSTMs for example treat their inputs as sequences and don't try to deconstruct them into parts, like e.g. Convolutional Neural Nets do. So maybe that's what's meant by "linear".

No opinion on the specifics of this distinction, but it's worth noting that in research, an awful lot of successful projects have their origins in failed projects of decades ago...
My experience working in machine learning academia is an overfocus on failed projects from the early 00s to 90s that really only stopped in 2020+.

We can often trace back successful projects to failed precursors, but often the people behind the successful project are not even familiar with the failed precursor and the 'connection to the past' only really occurs in retrospect. See the 'adjoint state method' and connections with backprop.

This is sometimes true, sure. And often the older work has more entered the general consciousness than being chased down by searching specific cites. On the other hand, very little is truly new, and recency bias can lead you into all sorts of back-eddy's.

Once the dust has settled, there are often much clearer through lines than in looked like at the time. It's hard to see when you are on the moving front though.

Depthwise RNN?
Like decode the next token, then adjust what you're paying attention to, then decode it again?
Isn't it the only way to, say,understand a pun?
That is exactly how LLM inference is performed, so I'm being cheeky (I'm 99% sure anyone proposing anything in this thread is someone handwaving based on limited understanding)
You would be wrong, but that is fine. Been working with attention since 2018.

Why assume I know little and leave snarky comments (and basically a repetition of the prior joke at that, subbing RNN for transformer)?

To playfully invite for you to participate in conversation further, so that I may humbly learn from you. "I don't know what you're talking about" seemed too spartan and austere and aggressive, and you reciprocated politely, if again sparsely, when the other person playfully invited you to elaborate.