|
|
|
|
|
by jasonphang
1461 days ago
|
|
I agree that the lack of benchmarks makes it hard to determine how valuable this model is. But on the topic of dropout, dropout has been dropped for the pretraining stage of several other large models. Off the top of my head: GPT-J-6B, GPT-NeoX-20B, and T5-1.1/LM. |
|