Hacker News new | ask | show | jobs
by NaiveBayesian 692 days ago
I believe that counterexample only works in the limit where the sample size goes to infinity. Every finite sample will have μ≠0 almost surely.(Of course μ will still tend to be very close to 0 for large samples, but still slightly off)

So this means the sequence of μₙ will perform a kind of random walk that can stray arbitrarily far from 0 and is almost sure to eventually do so.

1 comments

Fair point about the mean, but I don't see how the random walk causes the standard deviation to shrink towards zero.
I agree. The authors generate a dataset of a similar size as the original and then train on that continuously (e.g. for multiple epochs). That's not what you need to do in order to get new model trained on the knowledge of the teacher. You need to ask the teacher to generate new samples every time, otherwise your generated dataset is not very representative of the totality of knowledge of the teacher. Generating samples every time would (in infinite limit) solve the collapse problem.
Agreed, that's what I struggle to see as well. It's not really clear why the variance couldn't stay the same or go to infinity instead. Perhaps it does follow from some property of the underlying Gamma/Wishart distributions.