Hacker News new | ask | show | jobs
by jeremysalwen 2359 days ago
Not true, a transformer can be used in models without any lookahead, for example how it is used in gpt-2.! The real difference is the complexity of the model and the large increase in computational cost.