Hacker News new | ask | show | jobs
by imtringued 244 days ago
Only being able to run transformers is a silly concept, because attention consists of two matrix multiplications, which are the standard operation in feed forward and convolutional layers. Basically, you get transformers for free.
1 comments

devil is in the details