| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by proc0 896 days ago
	Training a decently sized LLM takes hundreds of GPUs days and weeks, if not more. Anything smaller is not as useful or "smart". This includes the size of the data, the sanitation of the data, and the training cycles, all of which require a lot of compute resources. There are models out there that were trained with less and used other LLMs to generate their training data (I think Alpaca or Vicuna models are one of them), but doing this is even more complicated. Disregarding the quality of the results, then yes you can train a small LM on any data. I don't know what the threshold is for usefulness and coherence of the final model.