|
|
|
|
|
by sailingparrot
1976 days ago
|
|
> You don't need a TPU cluster to train a working GPT-2 model [...] A free GPU on Colab gets you most of the way I have a hard time believing you can really train it with 1 V-100, unless you are talking about an extremely scale down version of GPT-2 (large). If you can train it at all it would be with a batch size so small (probably 1?) that it would hurt the performance and it would take months. I am out of the loop somehow? Edit: I was thinking about reproducing the training that OpenAI did in their paper, so redoing all the pre-training, but I realized you might have been talking about training on a smaller custom dataset. |
|