| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by YetAnotherNick 1025 days ago
	They should focus more on finetuning I think. Finetuning is almost always better than pretraining, even if the pretraining dataset is very different than finetuning dataset. If I could train 30b model for $10 for few tens of million of tokens(basically proportional to current rate), I will definitely use it.

1 comments

marcinzm 1025 days ago

You can already do that afaik. HuggingFace even provides some nice notebook examples on how to achieve it with AWS SageMaker and HuggingFace libraries. You don't need anywhere near 100-1000 GPUs to fine tune which makes it a much easier problem to just run on existing clouds.

link

YetAnotherNick 1025 days ago

I know and I use instances to train, but it would be a heavy improvement if all I need to do is select huggingface datasets and click train and get a model I could test in playground.

link