Hacker News new | ask | show | jobs
by sp332 2683 days ago
Does someone have a description of the network somewhere? Does it use LSTM for memory or what? Is there anything unusual about the size or structure of the network? Does it use an attention mechanism?
1 comments

I would recommend reading the paper: https://d4mucfpksywv.cloudfront.net/better-language-models/l...

and the previous paper

https://s3-us-west-2.amazonaws.com/openai-assets/research-co...

It's a transformer, not LSTM, and it's very large but not structured in a particularly unusual way.