Hacker News new | ask | show | jobs
by sinuhe69 800 days ago
It would be more impressive (and cleaner, btw) if it was fed with fan-fiction books and not the original books. Then we can see what it can make out of the context and what it "borrows" from the training data.

Why fan-fiction? Well, fan-fictions are not famous enough to be included in any training corpus, I believe. But fan-fictions of Harry Potter are numerous enough to test the context limit. There are also similarities and distinctions from the originals, which require correct recall to distinguish between them. That would be a good test, isn't it?

1 comments

Why are fanfictions not famous enough to be included? There are huge archives of them online, which make for great sources of information. Archive of our own for example lists over 12 million works on their site.

It’s cheap to gather, unlikely to have any recourse, and has a huge range of quality.