Thanks for this explanation. I understood most of it but could you explain why you should normalize using 1/sqrt(n) and why doing so makes the result converge in distribution?
This holds for any n, which means that, if you normalize by 1/sqrt(n) instead of 1/n, the "randomness" never vanishes even when n gets infinitely large. If you normalize by something bigger than 1/sqrt(n) the variance blows up, and if you normalize by something less than 1/sqrt(n), the variance collapses to zero so you get something concentrated at a single point.
The CLT tells us more than that, it actually tells us how the randomness is distributed when n gets very large, which is pretty remarkable when you think about it. (and it holds under much weaker conditions than what I mentioned above, it's just that those assumptions are probably the easiest to understand).
The CLT tells us more than that, it actually tells us how the randomness is distributed when n gets very large, which is pretty remarkable when you think about it. (and it holds under much weaker conditions than what I mentioned above, it's just that those assumptions are probably the easiest to understand).