| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by danielmarkbruce 594 days ago
	That isn't the way they work today. LLMs can easily find errors in outputs they themselves just produced. Start adding different prompts, different models and you get all kinds of ways to catch errors. Just like humans.

4 comments

Lio 594 days ago

I don’t think LLMs can easily find errors in their output.

There was a recent meme about asking LLMs to draw a wineglass full to the brim with wine.

Most really struggle with that instruction. No matter how much you ask them to correct themselves they can’t.

I’m sure they’ll get better with more input but what it reveals is that right now they definitely do not understand their own output.

I’ve seen no evidence that they are better with code than they are with images.

For instance, if the time to complete only scales with length of the token and not the complexity of its contents then it probably safe to assume it’s not being comprehended.

link

philipwhiuk 594 days ago

> LLMs can easily find errors in outputs they themselves just produced.

No. LLMs can be told that there was an error and produce an alternative answer.

In fact LLMs can be told there was an error when there wasn't one and produce an alternative answer.

link

danielmarkbruce 594 days ago

You don't use LLMs.

https://chatgpt.com/share/6722e41d-6b20-8002-8cbb-3012cd9179...

link

mavidser 594 days ago

https://chatgpt.com/share/672331d2-676c-8002-b8b3-10fc4c8d88...

In my experience, if you confuse an LLM by deviating from the the "expected", then all the shims of logic seem to disappear, and it goes into hallucination mode.

link

danielmarkbruce 594 days ago

Try asking this question to a bunch of adults.

link

mavidser 588 days ago

Tbf that was exactly my point. An adult might use 'inference' and 'reasoning' to ask clarification, or go with an internal logic of their choosing.

ChatGPT here went with a lexigraphical order in Python for some reason, and then proceeded to make false statements from false observations, while also defying its own internal logic.

    "six" > "ten" is true because "six" comes after "ten" alphabetically.

No.

    "ten" > "seven" is false because "ten" comes before "seven" alphabetically.

No.

From what I understand of LLMs (which - I admit - is not very much), logical reasoning isn't a property of LLMs, unlike information retrieval. I'm sure this problem can be solved at some point, but a good solution would need development of many more kinds of inference and logic engines than there are today.

link

cdchn 594 days ago

Do you believe that the LLM understands what it is saying and is applying the logic that you interprets from its response, or do you think its simply repeating similar patterns of words its seen associated with the question you presented it?

link

danielmarkbruce 594 days ago

If you take the time to build an (S?)LM yourself, you'll realize it's neither of these. "Understands" is an ill-defined term, as is "applying logic".

But a LLM is not "simply" doing anything. It's extremely complex and sophisticated. Once you go from tokens into high-dimensional embeddings... it seems these models (with enough training) figure out how all the concepts go together. I'd suggest reading the word2vec paper first, then think about how attention works. You'll come to the conclusion these things are likely to be able to beat humans at almost everything.

link

lomase 594 days ago

You said humans are machines that make errors ans that LLMs can easily find errors in output they themself produce.

Are you sure you wanted to say that? Or is the other way around?

link

danielmarkbruce 594 days ago

Yes. Just like humans. It's called "checking your work" and we teach it to children. It's effective.

link

0points 594 days ago

> LLMs can easily find errors in outputs they themselves just produced.

Really? That must be a very recent development, because so far this has been a reason for not using them at scale. And noone is.

Do you have a source?

link

danielmarkbruce 594 days ago

Lots of companies are using them at scale.

link