Hacker News new | ask | show | jobs
by viraptor 799 days ago
The training data may not be HP itself. It may be millions of pages summarising/discussing/dissecting HP, which already contain the relationships spelled out better than in the book itself.
1 comments

That's true, but the model still analyzed all that disparate information and produced a very detailed graph of the relevant relationships. If anyone can show that the graph itself was in the training data, then I would agree that it's not a good test.
> disparate information

I wouldn't call it disparate when there's about a dozen wikis each spelling it out like this: https://harrypotter.fandom.com/wiki/Severus_Snape

If eat my hat if multiple graphs almost exactly like this one weren’t in the training days. This is like fandoms 101.
The frustrating thing about all this speculations is, that we don't know what was in the training data, but I think we should know that, to have any meaningful discussion about it.
We should. However in this case, isn't it a bit of a stretch to assume they didn't put just about everything in the training data?
It would have been fairly trivial to AB test this where the other side is to ask the same question but without all the books in-window.