Hacker News new | ask | show | jobs
by katet 89 days ago
Not that I've had to deal with this specifically, but I have noticed how the input phrasing in my prompts pushes the LLM in different directions. I've just tried a quick test with `duck.ai` on gpt 4o-mini with:

A: Why is drinking coffee every day so good for you?

B: Why is drinking coffee every day so bad for you?

Question A responds that it has "several health benefits", antioxidants, liver health, reduced risk of diabetes and Parkinson's.

Question B responds that it may lead to sleep disruption, digestive issues, risk of osteoporosis.

Same question. One word difference. Two different directions.

This makes me take everything with a pinch of salt when I ask "Would Library A be a good fit for Problem X" - which is obviously a bit leading; I don't even trust what I hope are more neutral inputs like "How does Library A apply to Problem Space X", for example.

3 comments

Again a model issue. At the risk of coming off as a thread-wide apologist, here are my results on Opus:

Good:

> The research is generally positive but it’s not unconditionally “good for you” — the framing matters.

> What the evidence supports for moderate consumption (3-5 cups/day): lower risk of type 2 diabetes, Parkinson’s, certain liver diseases (including liver cancer), and all-cause mortality……

Bad:

> The premise is off. Moderate daily coffee consumption (3-5 cups) isn’t considered bad for you by current medical consensus. It’s actually associated with reduced risk of type 2 diabetes, Parkinson’s, and some liver diseases in large epidemiological studies.

> Where it can cause problems: Heavy consumption (6+ cups) can lead to anxiety, insomnia……

This isn’t just my own one-off examples. Claude dominates the BSBench: https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

The BSBench is such a fantastic resource - thank you for sharing.

We should really be citing rather than anecdata every time someone brings up hallucinations.

What I do for questions like these is read what medical researchers have published. The first I read was https://pmc.ncbi.nlm.nih.gov/articles/PMC5696634/

> Coffee consumption was more often associated with benefit than harm for a range of health outcomes across exposures including high versus low, any versus none, and one extra cup a day. There was evidence of a non-linear association between consumption and some outcomes, with summary estimates indicating largest relative risk reduction at intakes of three to four cups a day versus none, including all cause mortality (relative risk 0.83, 95% confidence interval 0.83 to 0.88), cardiovascular mortality (0.81, 0.72 to 0.90), and cardiovascular disease (0.85, 0.80 to 0.90). High versus low consumption was associated with an 18% lower risk of incident cancer (0.82, 0.74 to 0.89). Consumption was also associated with a lower risk of several specific cancers and neurological, metabolic, and liver conditions. Harmful associations were largely nullified by adequate adjustment for smoking, except in pregnancy, where high versus low/no consumption was associated with low birth weight (odds ratio 1.31, 95% confidence interval 1.03 to 1.67), preterm birth in the first (1.22, 1.00 to 1.49) and second (1.12, 1.02 to 1.22) trimester, and pregnancy loss (1.46, 1.06 to 1.99). There was also an association between coffee drinking and risk of fracture in women but not in men.

> Conclusion Coffee consumption seems generally safe within usual levels of intake, with summary estimates indicating largest risk reduction for various health outcomes at three to four cups a day, and more likely to benefit health than harm.

When I'm looking for medical advice, I want that advice to list things like "coffee drinking might not be safe during pregnancy".

Furthermore, the statement 'Heavy consumption (6+ cups) can lead to anxiety, insomnia ...' assumes caffeinated coffee, yes? The paper I linked to also discusses decaffeinated coffee, eg:

> High versus low intake of decaffeinated coffee was also associated with lower all cause mortality, with summary estimates indicating largest benefit at three cups a day (0.83, 0.85 to 0.89)28 in a non-linear dose-response analysis. ...

> Coffee consumption was consistently associated with a lower risk of Parkinson’s disease, even after adjustment for smoking, and across all categories of exposure.22 76 77 Decaffeinated coffee was associated with a lower risk of Parkinson’s disease, which did not reach significance. ...

> there were no convincing harmful associations between decaffeinated coffee and any health outcome.

That nuance seems important.

Also note that this paper is incomplete as it investigated defined health outcomes, not physiological outcomes like anxiety. There are plenty more papers, like https://academic.oup.com/eurheartj/article/46/8/749/7928425?... , which considers the time that people drink coffee, also discusses decaffeinated coffee, and highlights the uncertainty about the effect of heavy coffee drinking.

I don't see why I should care to ask an AI when it's so easy to find well-written research results which are far more likely to cover relevant edge cases.

A person would respond the same way? What exactly are you expecting as the output to those questions?
Clickbait journalists answers like that, experts mostly don't. But it does make sense it mimics clickbait journalists more since it was trained on the internet.
That's true and fair, and re-reading OP it doesn't address hallucinations exactly either. I was more thinking of it as a toy example for non-tech folk (grandma?) to see that what and how you ask LLMs matters in how the sycophancy will come out in the response. There may be better ways to demo that though :shrug:
Both are true though