Y
Hacker News
new
|
ask
|
show
|
jobs
by
phowon
2686 days ago
The BERT paper also introduced BERT Base, with is 12 layers with approximately the same number of parameters as GPT, but still outperforms GPT on GLUE.