|
|
|
|
|
by gdiamos
240 days ago
|
|
They will hire anyone who can produce a model better than GPT5, which is the bar for fine tuning Otherwise, you should just use gpt5 Preparing a few thousands training examples and pressing fine tune can improve the base LLM in a few situations, but it also can make the LLM worse at other tasks in hard to understand ways that only show up in production because you didn’t build evals that are good enough to catch them. It also has all of the failure modes of deep learning. There is a reason why deep learning training never took off like LLMs did despite many attempts at building startups around it. Andrej karpathy has a rant about it that captures some of the failure modes of fine tuning - https://karpathy.github.io/2019/04/25/recipe/ |
|
Depends on what you want to achieve, of course, but I see fine-tuning at the current point in time primarily as a cost-saving measure: Transfer GPT5-levels of skill onto a smaller model, where inference is then faster/cheaper to run. This of course slows down your innovation cycle, which is why generally this is imo not advisable.