Hacker News new | ask | show | jobs
by unblough 976 days ago
You are unable to.

“LLMs can’t self-correct in reasoning tasks, DeepMind study finds“

https://news.ycombinator.com/item?id=37823543

Anyone who says otherwise is either ignorant of the underlying function of llms or trying to sell you something.

2 comments

>You are unable to.

This is just wrong lol.

GPT-4 logits calibration pre RLHF - https://imgur.com/a/3gYel9r

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback - https://arxiv.org/abs/2305.14975

Teaching Models to Express Their Uncertainty in Words - https://arxiv.org/abs/2205.14334

Language Models (Mostly) Know What They Know - https://arxiv.org/abs/2207.05221

> This is just wrong lol.

The needless condescension of your “lol” feels a bit premature.

How can you have self correction without superintelligence?

Or, they realize that absence cannot be scientifically proven, and that scientists use language loosely, confusing those who take their loose language literally.

The study didn't "find" (discover) what they claim, rather, they didn't find validation that it "can" (the implementation of which varies per observer, sub-perceptually).

If you had to code something like this at work for a different domain, I bet you'd have no problem realizing that a nullable boolean is required to accurately model the problem space.

I recognize your clarification of “discovery” and conclusion from that research, but I do think there is a strong argument that in terms of the stochastic usage of a nonlinear system the “undefined” state of your nullable boolean is itself a falsey state.
You can argue whatever you like, but if the unknown IS actually known, why can't scientists tell us their secrets? How many people would have to be in on the scheme?

And this isn't just a one off, this is a systemic, institutional shortcoming, I encounter several instances of it every day just in my regular social media feeds.