| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by parineum 204 days ago
	> so that the model isn't required to compress a large proportion of the internet into their weights. The knowledge compressed into an LLM is a byproduct of training, not a goal. Training on internet data teaches the model to talk at all. The knowledge and ability to speak are intertwined.