Given that "The Pile" includes PubMed and NIH as data sources it would be unlikely to have GPT4 not use them at all. Even GPT3 uses Wikipedia which does have (mostly) factual data with cited sources.
> Even GPT3 uses Wikipedia which does have (mostly) factual data with cited sources.
There's a LOT of stuff on Wikipedia where the source is a link to some random, long article and it's unclear where exactly the referred to information is coming from. Gets significantly worse for any "hot" topic.
It's lossy though, it tries to remember everything and relate that to everything else. Performance would increase significantly if it were tuned for that use case.
There's a LOT of stuff on Wikipedia where the source is a link to some random, long article and it's unclear where exactly the referred to information is coming from. Gets significantly worse for any "hot" topic.