Ask HN: Do modern AI engines still need to do full re-trainings?

Y	Hacker News new \| ask \| show \| jobs

11 points by zepearl 695 days ago

I learned about ~AI algorithms in the 90s: backprocessing & clustering networks, and a little bit of genetic algos.

I then focused & programmed & played for a while with the model of the "backpropagation" network, until the early 2000' => it was fun, but not usable in my context. I then stopped fiddling with it and became inactive in this context.

An important property of a backpropagation network was (as much as I know) that it had to be fully re-trained whenever inputs changed (values of existing ones changed or inputs/outputs were removed/added).

Question:

Is it still like that for the currently fancy algos (the ones developed by Google/Facebook/OpenAI/Xsomething/...) or are they now better, so that they can now adapt without having to be fully retrained using the full set of (new/up-to-date) training data?

Asking because I lost track of the progress in this area during the last 20 years and especially recently I understand nothing involving all new names (e.g. "llama", etc...).

Thanks :)

2 comments

Micoloth 694 days ago

I think what you are referring to is the concept of “finetuning”. You use a pretrained network and add a (relatively) small set of new input-output pairs to steer it in a new direction.

It's widely used, you can look it up.

A more challenging idea is whether it is possible to reuse the pretrained weights when training a network with a different architecture (maybe a bigger transformer with more heads, or something).

AFAIK this is not common practice, if you change the architecture you have to retrain from scratch. But given the cost of these trainings, I wouldn't be surprised if OpenAI&co had developed some technique to do this, eg across GPT versions..

link

cuteboy19 694 days ago

Full arch changes are rare. Mostly you would just attach stuff on top or at the sides

link

vasili111 694 days ago

Large Language models are pre-trained by creators on the huge data.

In many cases you do not need to do anything with LLM and you can just use it.

If they were not trained on the data that contains information that you are interested then you can use technique called RAG (Retrieval-Augmented Generation).

You also can do fine-tuning which is kind of training but on small amount of data.

link