Hacker News new | ask | show | jobs
by imtringued 483 days ago
This is in fact a big problem with fine-tuning most models. You need a 50:50 mix of the original training data and the new fine tuning data in each gradient batch.

Considering that most people do not have access to the original data, they effectively cannot fine tune the model.

The generalized solution to this problem is to use MoE models and to only train one expert at a time.

1 comments

Is this true with satanle diffusion, too? Because I've fine tuned it with just a folder full of images and txt files with "keyword" as the only line in each text file.

No original weights, there...