Hacker News new | ask | show | jobs
by YetAnotherNick 1025 days ago
They should focus more on finetuning I think. Finetuning is almost always better than pretraining, even if the pretraining dataset is very different than finetuning dataset. If I could train 30b model for $10 for few tens of million of tokens(basically proportional to current rate), I will definitely use it.
1 comments

You can already do that afaik. HuggingFace even provides some nice notebook examples on how to achieve it with AWS SageMaker and HuggingFace libraries. You don't need anywhere near 100-1000 GPUs to fine tune which makes it a much easier problem to just run on existing clouds.
I know and I use instances to train, but it would be a heavy improvement if all I need to do is select huggingface datasets and click train and get a model I could test in playground.