Hacker News new | ask | show | jobs
by gwern 170 days ago
So if it's not using attention and it processes the entire input into an embedding to process in one go, I guess this is neither a Transformer nor a RNN but just a MLP?