Hacker News new | ask | show | jobs
by FeepingCreature 701 days ago
Can you post examples?
1 comments

Sure. Something I tried the other day was asking questions about modular arithmetic — specifically, phrased in terms of quotient groups of the integers. Things like ‘how many homomorphisms are there between Z/12Z and Z/6Z?’. I was able to trip it up very easily with these sorts of questions, especially when it tries to ‘explain’ its answers and it says ridiculous (but superficially and momentarily plausible-looking) things like ‘the only solutions to the equation 12x = 0 (mod 12) in Z/12Z are {[0], [3], [6], [9]}, therefore…’.

You can also just quiz it on certain basic definitions. Ask it for examples of objects that don’t exist (graphs or categories with certain properties, etc.). Sometimes it’ll be adamant that its stated example works, but usually it will quickly apologise and admit to being wrong only to give almost exactly the same (broken) argument again.

Another thing you can try is concocting some question that isn’t even syntactically well-formed (i.e. fails even a type check) like ‘is it true that cyclic integer lattices are uniformly bounded below in the Riemann topology?’. I imagine that one is too far out to work, but when I’ve played around I’ve found many such absurd questions ChatGPT was only too happy to answer — with utter nonsense, of course. It’s interesting (and, I think, quite telling) that such systems are seemingly almost completely unable to decline to answer a question. And the reason is that there’s no difference between hallucination and non-hallucination. Internally, it’s exactly the same process. It either knows or doesn’t know — but it doesn’t know that it knows (or doesn’t).

LLMs basically only work on questions that are very similar to, or identical to, questions that have already been widely asked and answered online or in books… hence their lack of utility in mathematical research, or even in calculating one’s taxes, or whatever.

I could provide some more literal examples, but I’d have to go and try some and pick the ones that work, and even then they might not work on your end because of the pseudorandomness and the fact that the model keeps getting updated and patched. It’s better to just play around on your own based on the ideas I’ve given.

The moral is to use LLMs as a powerful way of finding information, but don’t trust anything it says. Use it to find better sources more quickly than you’d be able to via a search engine.

Well sure but ... I think the foundational problem here is just the "being unable to refuse to answer a request." The rest of the behavior you describe just follows from it.

If you for instance threatened to shoot a human if it refused a request or admitted it didn't know something, they might answer in a very similar fashion.