| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dTal 1100 days ago
	They "simplify" the training data, which they are vastly smaller than. LLMs are like compression algorithms. You could imagine feeding the training data back in, letting it guess the next token, and entropy coding the residual - this would result in an excellent compression ratio. This compression performance is a direct consequence of abstract features of the dataset that it has managed to encode - knowing that the capital of France is Paris allows you to make predictions about many sentences, not just "The capital of France is...".

1 comments

mxkopy 1100 days ago

True, but I still think there's some fallacy here. Are we sure that models of the world (i.e. understanding) are the only way to achieve compression?