| HN Mirror

I agree. The authors generate a dataset of a similar size as the original and then train on that continuously (e.g. for multiple epochs). That's not what you need to do in order to get new model trained on the knowledge of the teacher. You need to ask the teacher to generate new samples every time, otherwise your generated dataset is not very representative of the totality of knowledge of the teacher. Generating samples every time would (in infinite limit) solve the collapse problem.