|
|
|
|
|
by dvt
811 days ago
|
|
I think you may be misunderstanding what fine tuning does. It does not teach the model new knowledge. In fact, Meta has a paper out that argues you only need a data set of 1000[1] to achieve pretty good alignment (fine-tuning) results. (100M is way overkill.) For knowledge retrieval, you need RAG (usually using the context window). [1] https://arxiv.org/pdf/2305.11206.pdf |
|
LIMA demonstrated that instruction-tuning and output formatting could be trained with a limited number of samples, not that finetuning was incapable of adding new information to the model.
It may be sub-optimal in most cases to RAG, but it does work.