Hacker News new | ask | show | jobs
by amasad 1135 days ago
Broadly finetuning is any post pretraining training. Most of the time it is used in the context of fitting a more narrow task. In our case, it was the same training objective as the pretraining but meant to be more representative of what Replit users like to code. However, we were surprised by how well it boosted overall performance. Best guess: it's a) novel data and b) the model could take even more training!!
3 comments

How feasible and effective would it be to fine-tune a model against an organization's private source code, resulting in an "internal" model that knows how to work with that org's stuff?

Could you, say, fine-tune the model every week with the latest merges? Every hour?

Finetuning is a relatively quick process. Training the base model is the expensive part (can take weeks and huge amounts of compute), whereas finetuning usually is only on the last few layers and can be done with much less resources. You could definitely have a "nightly" finetune model that is retrained every day or so.
Interesting - how would that work for a company that wanted to run their own codex model, on-prem, trained on their own code? Perhaps also trained on their dependencies?
Finetuning a smaller model leading to better performance seems like a significant finding that'll lead to a lot of companies fine-tuning their own internal "ChatGPT"s
You seem to know your stuff some, so I'll ask you a question on this: Are there any good books on all the different approaches in this space, or is it all too new and fast moving for such a thing?
There are no books on Large LMs but almost any resource about neural networks covers fine tuning. I like the FastAI courses, and these do cover language models.
You can also check the NLP with transformers book
When you fine-tune it, do you train just the head/last few layers or do you also unfreeze the model afterwards and retrain the whole model with a very small LR for a few epochs?