| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tadala 664 days ago
	Everyone wants to use less compute to fit more in, but (obviously?) the solution will be to use more compute and fit less. Attention isn't (topologically) attentive enough. All these RNN-lite approaches are doomed, beyond saving costs, they're going to get cooked by some other arch—even more expensive than transformers.

1 comments

falcor84 664 days ago

Would you mind expanding upon your thesis? If that compute and all those parameters aren't "fitting" the training examples, what is it that the model is learning, and how should that be analyzed?

link

ithkuil 664 days ago

I think there are two distinct areas. One is the building of the representations, which is achieved by fitting. The other area is loosely defined as "computing" which is some kind of searching for a path through representation space. All of that is wrapped in a translation layer that can turn those representations into stuff we humans can understand and interact with. All of that is achieved to some extent by current transformer architectures, but I guess some believe that they are not quite as effective at the "computation/search" stage.

link

falcor84 664 days ago

But how does it get good at "computing"? The way I see it, we either program them to do so manually, or we use ML, at which case the model "fits" the computation based on training examples or environmental feedback, no? What am I missing?

link

ithkuil 664 days ago

the distinction is fuzzy indeed, especially if any thing that you "program in manually" has some parameters that are learned.

Conceptually we already have parts of the model that are not learned: the architecture of the model itself.

link