Hacker News new | ask | show | jobs
by jprafael 1103 days ago
Given that "The Pile" includes PubMed and NIH as data sources it would be unlikely to have GPT4 not use them at all. Even GPT3 uses Wikipedia which does have (mostly) factual data with cited sources.
3 comments

> Even GPT3 uses Wikipedia which does have (mostly) factual data with cited sources.

There's a LOT of stuff on Wikipedia where the source is a link to some random, long article and it's unclear where exactly the referred to information is coming from. Gets significantly worse for any "hot" topic.

It's lossy though, it tries to remember everything and relate that to everything else. Performance would increase significantly if it were tuned for that use case.
Problem is half of the publications or even more is pure garbage that cannot be reproduced so there is that.