|
|
|
|
|
by riku_iki
2684 days ago
|
|
I think the big deal is the size of model, BERT large is 300M params, and this one is 1.5B.
Bert has been trained on pod with 64 TPUs, and this model requires even larger GPU/TPU cluster. There is no way indie underfunded researcher can train such model. |
|