|
|
|
|
|
by naillo
1300 days ago
|
|
They're motivating that choice via this paper: https://arxiv.org/pdf/2203.15556.pdf The paper shows that you can get better performance than gpt-3 with a much smaller model if you bump up the training time and training data like x4. |
|
I think they're looking into larger models later though