Hacker News new | ask | show | jobs
by czr 2683 days ago
I would recommend reading the paper: https://d4mucfpksywv.cloudfront.net/better-language-models/l...

and the previous paper

https://s3-us-west-2.amazonaws.com/openai-assets/research-co...

It's a transformer, not LSTM, and it's very large but not structured in a particularly unusual way.