Same idea here? Larger models do a better job forgetting their training data and dropping their semantic priors. Perhaps another way of thinking through this is that larger models learn new information and drop old information faster. https://arxiv.org/abs/2303.03846