Hacker News new | ask | show | jobs
by nil-sec 2336 days ago
So you are saying the system memory is 300GB and you can train your model on the cpu instead? Well yeah you can always do that but training will be slow because your model is not trained on the GPU. What’s the point?
1 comments

It's not that slow. And you can use many TPUs together to make up the speed difference.
If that were the case I am wondering why anyone would buy GPUs? I invite you to retrain a state of the art model of your choice on a CPU and see how far you get.
We fine-tuned GPT-2 1.5B for subreddit simulator using this technique. https://www.reddit.com/r/SubSimulatorGPT2Meta/comments/entfg...