Japanese light novels were almost certainly in the training set, either in their original Japanese, an English translation in that Books2 pile, or in a fan translation that happened to get scraped.
That's expected, and why the model can reproduce the basic details.
But those Japanese light novels don't have millions of forum discussions and essays written on it. So it shows how well the model can recall sparse data in its training dataset, rather than recalling a dataset that basically shows up 100000 times in different forms.
But those Japanese light novels don't have millions of forum discussions and essays written on it. So it shows how well the model can recall sparse data in its training dataset, rather than recalling a dataset that basically shows up 100000 times in different forms.