Hacker News new | ask | show | jobs
by scoresmoke 500 days ago
GPT-2 follows the very well-studied architecture of Transformer decoder, so the outcomes of this study might be applicable to the more complicated models.