| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by imtringued 782 days ago
	Something that appears to be hardly known is that the transformer architecture needs to become more compute bound. Inventing a machine learning architecture which is FLOPs heavy instead of bandwidth heavy would be a good start. It could be as simple as using a CNN instead of a V matrix. Yes, this makes the architecture less efficient, but it also makes it easier for an accelerator to speed it up, since CNNs tend to be compute bound.