| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by getnormality 98 days ago
	> I think most ML people now think of neural-network architectures as being, essentially, choices of tradeoffs that facilitate learning in one context or another when data and compute are in short supply, but not as being fundamental to learning. Is this a practical viewpoint? Can you remove any of the specific architectural tricks used in Transformers and expect them to work about equally well?

2 comments

musebox35 98 days ago

I think this question is one of the more concrete and practical ways to attack the problem of understanding transformers. Empirically the current architecture is the best to converge training by gradient descent dynamics. Potentially, a different form might be possible and even beneficial once the core learning task is completed. Also the requirements of iterated and continuous learning might lead to a completely different approach.

link

etiam 98 days ago

Did you see this one?

https://news.ycombinator.com/item?id=41732853

link