Hacker News new | ask | show | jobs
by visarga 2475 days ago
I dunno, this time the text looks really good. I got as far as 5 or 6 phrases deep before it said anything silly. I would have been fooled if I red it in real life.

My guess is that they will perfect the transformer and its training process, curate the dataset and make this method really easy to use. Maybe it can do translation, math, even auto-complete code. That is only by iterating more on the current formulation of the Transformer.

But it is also possible that it is surpassed by something even better. This new language model could replace the inductive bias specific to the Transformer - the ability to "attend" to any part of the input text, with something more efficient, because Transformers are quite hard and expensive to train right now. Maybe the Transformer inductive bias is too general (like a fully connected network) and needs too much data, with a slightly different idea it could be made much more efficient and probably more convincing.