| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 317070 462 days ago
	I've been finetuning these models since before chatGPT, and the one lesson I've learned is that by the time you have set up everything to fine-tune a model, you can expect a newer model to do as well with prompt-tuning. So, unless you hope to stay at the fore front (e.g. to be ahead of competitors), there has been no real reason to finetune for the last 4 years, at best you could hope to stay about 1-3 months ahead, depending on how fast you were at setting up your training. And if that is what you did hope to achieve, you needed to automate on a higher level, i.e. automate data collection and the collection of eval cases.

3 comments

nwienert 462 days ago

It feels like there should be a service where I just drag drop a folder of examples and it fine tunes the latest DeepSeek or whatever for me and even can host it for me at some cost. I'd pay for that immediately, but last I checked there was nothing that really did that well (would love to be wrong).

link

arkmm 462 days ago

There are some options out there, depending on what type of task you're trying to fine tune. I think RL finetuning for DeepSeek e.g. isn't well developed yet, but you can finetune a small LLama model (~3B params) for classification or extraction tasks and it works really well. What sort of tasks were you looking at finetuning for?

link

nwienert 462 days ago

Code generation or question answering. But ideally 70+B

link

fragmede 461 days ago

Vibe coding has taken over for frontend dev, but outside that narrow band of very visible coding, most models aren't great at more esoteric programming languages. Even Swift gives Claude trouble. So the reason to fine-tune is simply that the best newest models still remain bad at things outside their comfort zone (how human).

link

317070 461 days ago

I take my quip both ways, so I would wager that even with finetuning, these models are only 1 generation ahead in esoteric language performance and therefore _still not very good_. Am I correct?

link

fragmede 461 days ago

Wanting it to be bad reeks of copium.

link

317070 459 days ago

Why would I want it to be bad? I'm afraid I don't understand what you mean.

link

fragmede 459 days ago

you wrote, emphatically, that it would be "still not very good". Why do you believe that it would be still not very good after training on a specific problem? LLMs aren't able to do things outside their training data, as vast as it is, but if it's in it's training data, why are you emphatic that it's still not very good? If I ask it to make something that it just needs to copy out sample code of, it would be pretty good at that one very specific task to me.

link

m101 462 days ago

I feel like this is true but would be great if you could provide examples so we could get a better idea of why you think/know this.

link

317070 461 days ago

I work for DeepMind on project Astra. Not to dwell too deep into confidentiality of what capabilities I have been looking at, but it has been the theme since the flamingo model that you only gain about 1 model-generation by fine-tuning versus prompt-tuning.

link