You should try asking it detailed questions about medicine. It’s fairly deep and broad in its medical “knowledge.” But it’s not good at deductive or inductive reasoning and has no agency, so is entirely unsurprising it’s not good at differential diagnosis. We actually have good differential diagnosis expert systems the challenge is getting the providers to input queries properly. I can imagine GPT4 acting as an intermediary between human natural language and expert systems to great effect. It seems rather rash to be judging these technologies based off a crappy web UI thrown on a chatbot a few months ago.
Providers don't input queries "properly" because differential diagnosis software is complicated, clunky, and not easy to use for the overworked clinician.
Given that "The Pile" includes PubMed and NIH as data sources it would be unlikely to have GPT4 not use them at all. Even GPT3 uses Wikipedia which does have (mostly) factual data with cited sources.
> Even GPT3 uses Wikipedia which does have (mostly) factual data with cited sources.
There's a LOT of stuff on Wikipedia where the source is a link to some random, long article and it's unclear where exactly the referred to information is coming from. Gets significantly worse for any "hot" topic.
It's lossy though, it tries to remember everything and relate that to everything else. Performance would increase significantly if it were tuned for that use case.