Hacker News new | ask | show | jobs
by genrilz 595 days ago
You might not be able to sell someone a library that fixes all bugs, but you can sell (or give away) software systems that reduce the number of bugs. Doing that is pretty useful.

Examples include linters, fuzzers, testing frameworks, and memory safe programming languages (as in Rust, but also as in any language with a GC). All these things reduce the number of bugs in the final product by giving you a way to detect them. (except for memory safe languages, which just eliminate a class of bugs) The paper is advertising a method to detect whether a given output is likely to be affected by a "bug", and a taxonomy of the symptoms of such bugs. The paper doesn't provide a way to fix those, and hallucinations don't necessarily have a single cause. Some hallucinations might be fixed by contextual calibration [0], others might be fixed by adding more training data similar to the wrong example.

In any case, you need to find the bad outputs before you can perform any fixes. Because LLMs tend to be used to produce "fuzzy" outputs with no single right answer, traditional testing frameworks and the like aren't always applicable.

[0] https://learnprompting.org/docs/reliability/calibration

1 comments

Yeah for sure, but the claim in the article is something like "we found the line in compiler code that causes bugs" or "we found the bytes in the compiled object that causes bugs"

It's a panacea

To me the claims in the article read something like "we have found a way to identify execution paths in some common compiler architecture (which are the transformer architecture in the case of LLMs) which are often but not always associated with buggy code". This seems like a reasonable claim to make.
Additionally, I think you may or may not be suspecting research malpractice. Obviously I don't have insider knowledge, but I would note that the idea of training probes in the middle layer of the model wasn't their idea. This paper cites other papers that already did exactly that. The contribution of this paper is simply that focusing on the middle layers for certain "critical tokens" gives a better signal than just checking the middle layers on every token.

It's of course possible that this paper in particular is fraudulent, but note that there is a field of research making the same basic claim as this paper, so this isn't some one off thing. A reasonable amount of people from different institutions would need to be in on it for the entire field to be fraudulent.

Alternatively, I think you may be objecting to the use of the word "truthfulness" in the abstract of the paper, because you seem to think that only human thoughts can possibly have a true or false value. I'm not actually going to object to the idea that only human thoughts can be true or false, but like the response I wrote to your koan comment, the user can interpret the LLMs output, which gives the user's thought a true or false value.

In this case, philosophically, you can think of this paper as trying to find cases where the LLM outputs strings that the user interprets as false. I think the authors of the paper are probably thinking about true or false more as a property of sentences, and thus a thing mere strings can possess regardless of how they are created. This is also a philosophically valid way to look at it, but differs from your view in a way that possibly made you think their claims absurd.