| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amasad 1183 days ago
	Broadly finetuning is any post pretraining training. Most of the time it is used in the context of fitting a more narrow task. In our case, it was the same training objective as the pretraining but meant to be more representative of what Replit users like to code. However, we were surprised by how well it boosted overall performance. Best guess: it's a) novel data and b) the model could take even more training!!

3 comments

spenczar5 1183 days ago

How feasible and effective would it be to fine-tune a model against an organization's private source code, resulting in an "internal" model that knows how to work with that org's stuff?

Could you, say, fine-tune the model every week with the latest merges? Every hour?

link

pyth0 1183 days ago

Finetuning is a relatively quick process. Training the base model is the expensive part (can take weeks and huge amounts of compute), whereas finetuning usually is only on the last few layers and can be done with much less resources. You could definitely have a "nightly" finetune model that is retrained every day or so.

link

rattray 1183 days ago

Interesting - how would that work for a company that wanted to run their own codex model, on-prem, trained on their own code? Perhaps also trained on their dependencies?

link

naderkhalil 1183 days ago

Finetuning a smaller model leading to better performance seems like a significant finding that'll lead to a lot of companies fine-tuning their own internal "ChatGPT"s

link

sanderjd 1183 days ago

You seem to know your stuff some, so I'll ask you a question on this: Are there any good books on all the different approaches in this space, or is it all too new and fast moving for such a thing?

link

nl 1183 days ago

There are no books on Large LMs but almost any resource about neural networks covers fine tuning. I like the FastAI courses, and these do cover language models.

link

osanseviero 1183 days ago

You can also check the NLP with transformers book

link

titaniczero 1183 days ago

When you fine-tune it, do you train just the head/last few layers or do you also unfreeze the model afterwards and retrain the whole model with a very small LR for a few epochs?

link