Hacker News new | ask | show | jobs
by gillesjacobs 717 days ago
Thanks! The fine-tunes where done in 2019-21 on a 4xV100 server with hyperparameter search, so thousands of individual fine-tuned models were trained in the end. I used weights and biased for experiment dashboarding the hyperparam search, but the hardware was our own GPU server (no cloud service used).

I doubt you can fine-tune BERT-large on a phone. A quantized, inference optimised pipeline can be leaps and bounds more efficient and is not comparable with the huggingface training pipelines on full models I did at the time. For non-adapter based training you're going to need GPUs ideally.