Hacker News new | ask | show | jobs
by phowon 2686 days ago
The BERT paper also introduced BERT Base, with is 12 layers with approximately the same number of parameters as GPT, but still outperforms GPT on GLUE.