| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by opisthenar84 1137 days ago
	With 75B tokens, would it be better to train from scratch, then apply instruction fine-tuning? Training from llama seems like it could introduce a lot of unexpected behavior.

1 comments

75B tokens is not really enough data to make an intelligent model. Llama was trained on over 1 trillion tokens.

And yes, training on top of LLaMA could introduce a lot of unexpected behavior, but that's just where the State-of-the-Art is today