Hacker News new | ask | show | jobs
by 3rd3 78 days ago
Isn't that "scheduled sampling"? In that case they also shift the input distribution toward that of the model, which possibly is even more crucial than shifting the output distribution?