|
|
|
|
|
by craigdalton
178 days ago
|
|
Excuse the blunt metaphor, but there is a risk here of turning on a fire-hose of "fresh" garbage. John Ioannidis, one of the doyens of evidence based medicine very persuasively argues - Why Most Published Research Findings Are False https://pmc.ncbi.nlm.nih.gov/articles/PMC1182327/ That is why platforms pay physicians/epidemiologists/ specialists in their field hundreds of dollars per hour to sort the good from bad papers. After my training as a doctor I did a Masters in Clinical Epidemiology and spent an afternoon each week in a tutorial that reviewed papers in the top journals - about 20-30% of them had major flaws that were either ignored or dismissed by the authors. It may be worse now. LLMs still have trouble picking up the subtleties of medical science and will miss papers with major flaws. I just did a test on a paper that is often quoted as providing evidence of excess cancer risk in communities living close to unconventional gas facilities. When I asked ChatGPT 5.2 to review the pape for evidence of increase cancer risk with a simple prompt it said the paper found such a risk. However, when I wrote a multi-discipline based prompt for 5.2 and Gemini 3 pro, it found the fatal flaw in the paper and advised it did not provide evidence. See the prompt and consider how the prompts would have to be individually developed for each paper and meta-analysis. For review of meta-analysis you would need prompts developed by expert methodologists and discipline specialists- here is the prompt that worked: You are an environmental epidemiologist and exposure scientist, critially review this papers claim that the measured levels of unconventional gas emissions provide evidence of excess cancer risk: https://link.springer.com/article/10.1186/1476-069X-13-82 |
|
1. The Garbage Filter: Right now, I rely on a strict Hierarchy of Evidence to mitigate this (prioritizing Cochrane/Meta-analyses over observational studies), but you are absolutely right that LLMs can miss fatal methodological flaws in a single, high-ranking paper.
2. The 'Critic' Agent: I’m currently experimenting with a secondary 'Critic' pass. This is an LLM agent specifically prompted to act as a skeptic/methodologist to flag limitations before the main synthesis happens.
3. Multi-discipline prompting: The prompt you provided is a great case study in persona-based auditing. I’d love to learn more about the specific 'disciplines' or archetypes you’ve found most effective at catching these flaws. That is exactly the kind of domain expertise I’m trying to encode into the system.