Hacker News new | ask | show | jobs
by amber_raza 172 days ago
This is a fantastic critique. Spot on. Freshness without appraisal is just an accelerated firehose of noise.

1. The Garbage Filter: Right now, I rely on a strict Hierarchy of Evidence to mitigate this (prioritizing Cochrane/Meta-analyses over observational studies), but you are absolutely right that LLMs can miss fatal methodological flaws in a single, high-ranking paper.

2. The 'Critic' Agent: I’m currently experimenting with a secondary 'Critic' pass. This is an LLM agent specifically prompted to act as a skeptic/methodologist to flag limitations before the main synthesis happens.

3. Multi-discipline prompting: The prompt you provided is a great case study in persona-based auditing. I’d love to learn more about the specific 'disciplines' or archetypes you’ve found most effective at catching these flaws. That is exactly the kind of domain expertise I’m trying to encode into the system.

2 comments

The personas have to paper specific I believe, addressing the content and methods. I guess an LLM could do a once over of the paper or meta-analysis to determine the best discipline specific personas - but would be interesting to test that. But there are also the benefits of deep expertise and understanding a field for decades. For example, I know a set of authors who repeatedly find significant associations in a field in almost every study they do, whereas others have variable results. They also seem to ignore good studies that disagree with their hypotheses and use inferior studies that support their position in review papers - so I dont really trust their work. It would be great if an LLM could develop that kind of understanding and somehow deprecate a body of work that had inherent author or institutional biases - even though on the surface the review looks legitimate. For a meta-analysis it is often the papers that are omitted that are most telling. That means the LLM will need to redo the entire search and synthesis - yikes!
You just articulated the 'Holy Grail' of automated appraisal. Detecting bias across a career is a massive graph problem compared to checking a single paper. It essentially requires auditing an entire bibliography before synthesis.

I am adding 'Author Reputation/Bias Analysis' to the long-term roadmap. Thanks for the rigorous stress-test today.

How will you do this, one author I don't trust (sent them an error they missed in their paper - didnt correct it, has systemic bias in their writing) was invited to write a review article by the New England Journal of Medicine - has an excellent reputation for all the world to see.
You found the ultimate edge case. The 'Prestige Proxy' (NEJM = Truth) essentially masks that individual's actual track record.

While we might be able to detect 'Insular Citation Clusters' mathematically to flag systemic bias, no model can catch a private signal like an ignored email. It reinforces why the human expert is indispensable. The tool is a force multiplier for judgment, not a substitute.

I warn against prioritizing Cochrane. It will block essential information from surfacing. This holds science back for over a decade. The best way to make science emerge is to take peer-reviewed reviews and meta-analyses at face value. If a particular review is bad, it will soon be corrected by other reviews, so don't worry about it.
I really disagree with this and there is ample evidence that science is not "self-correcting". Read Retraction Watch. I personally wrote to a journal on 3 occassions and phoned them twice to alert them to an error in a paper that the authors were reluctant to own up to and correct. I had inside knowledge and was able to provide the evidence of the error. Journal did nothing, they passed the message on to a range of sub editors (which were a revolving door), no investigation, no response. Google the "reproduciblity crisis" including the coverage of the issue in Nature to see how uncorrecting medical science can be.

Regarding Cochrane. It is reliable if is says a treatment does work, or an exposure has an effect, sometimes they miss effects because they only rely on particular sources of evidence e.g. RCTs, they were wrong on effectiveness of masks. As an example of reasonably up to date and evidence based free review sources on line - see Stat Pearls.

I fully understand that various articles, even peer-reviewed ones, can be bogus, and some reviews can be bogus too when they demonstrate an unfair bias in selecting articles. Journal managers too can be altogether apathetic. Even so, it has been my experience that reviews over the long term converge to the truth.

As for individual studies, if a study is important, it often gets tested by others, although sometimes it doesn't, and then it's a decision-theoretic play.

Cochrane in my estimation examines things from very narrow angles, and this can miss wide-ranging applicability to the real world.

That is a fair distinction.

My default right now is Clinical Safety. I prioritize high-grade evidence to prevent harm at the bedside.

However, for Research/Discovery, you are absolutely right. Excessive 'Gatekeeping' can slow down innovation.

The long-term fix is likely a 'Filter Dial'. We need tight constraints for treatment decisions, but loose constraints for hypothesis generation. I plan to support both modes.