Hacker News new | ask | show | jobs
by whimsicalism 2081 days ago
I was sloppy in my skimming of the paper - upon closer read it does actually seem quite different than that literature I mentioned (examples: RoBERTa, XLNet). I'll be reading it more carefully, but can now better understand the comparison to GPT-3.