| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by iflp 2105 days ago
	I appreciate the effort the authors put to this post, but this is like saying DNNs are stacked logistics regression: the connection is superficial, and doesn't lead to deep insights about how they really work.

3 comments

hobofan 2105 days ago

> deep insights about how they really work

It's not about "how they really work", but what data they operate on and what problems they can be applied to. When I first heard the term "transformer" from a friend, I didn't have any association in my mind because it's a very opaque term, but once he explained it to me as Graph Neural Networks, it very quickly clicked.

link

cjauvin 2105 days ago

I'm genuinely a bit surprised by that, that was always my high-level understanding of what the essence of neural networks was (at least feedforward vanilla ones), would you care to elaborate?

link

woko 2105 days ago

I think this post means the same as this tweet:

> Transformers are a special case of Graph Neural Networks. This may be obvious to some.

https://twitter.com/OriolVinyalsML/status/123378359362695168...

link

iflp 2104 days ago

It depends on what kind of understanding you want to achieve. It can be helpful to think of DNN as approximating the corresponding infinitely-wide versions. Depending on how you deal with certain scaling, they then act like a linear filter of the error signal in function space, or for single-hidden-layer networks at least, an interacting particle system. In both cases you can understand the convergence of gradient descent training using these analogies, although gaps from real-world practice exist.

link

mhh__ 2105 days ago

To get the neural network to do something useful you need to formalize some way of training it too?

link

wnoise 2105 days ago

While they can be thought of as stacked regression, it's only logistic regression with one particular non-linearity. And for many non-linearities you'll have a hard time usefully interpreting them as a regression.

link

unishark 2104 days ago

I think the common ones have statistical interpretations (that predate deep learning by a lot). Perhaps the one for the rectified linear unit is pretty obscure. But as I understand it, the statistics concept is called the "Tobit" model. It's meaning is not so obscure though, just a prediction that can be non-negative only, which is pretty common like a mass or energy.

link

p1esk 2105 days ago

Same here, I’m not sure what he’s trying to say.

link

fizixer 2105 days ago

ML publication is a complete mess right now.

Anyone can claim anything as long as they do a write-up and include some equations and pretty plots.

It was hard enough 5 years ago to filter out handful of good papers from the sea of bad research. Now it's getting near impossible.

link

czzr 2105 days ago

Somebody should train a model to do it.

link

SimplyUnknown 2104 days ago

You mean like arxiv-sanity? As I understand it, it trains a SVM on papers you like to suggest papers that are on the same side of the hyperplane. Could be used as a quality classifier by only liking high-quality work.

link