| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by simonw 786 days ago

Training doesn't work like that. Just because a model has been exposed to text in its training data doesn't mean the model will "remember" the details of that text.

Llama 3 was trained on 15 trillion tokens, but I can download a version of that model that's just 4GB in size.

No matter how "big" your model is there is still scope for techniques like RAG if you want it to be able to return answers grounded in actual text, as opposed to often-correct hallucinations spun up from the giant matrices of numbers in the model weights.