|
|
|
|
|
by minimaxir
1976 days ago
|
|
With current improvements to training performance and parallelism (e.g. DeepSpeed: https://www.deepspeed.ai ) it wouldn't surprise me if creating GPT-2 small from scratch becomes possible with a couple 3080s in days, with GPT-2 XL not taking 10x longer. |
|