Hacker News new | ask | show | jobs
by imtringued 782 days ago
Something that appears to be hardly known is that the transformer architecture needs to become more compute bound. Inventing a machine learning architecture which is FLOPs heavy instead of bandwidth heavy would be a good start.

It could be as simple as using a CNN instead of a V matrix. Yes, this makes the architecture less efficient, but it also makes it easier for an accelerator to speed it up, since CNNs tend to be compute bound.