| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throw310822 115 days ago
	> The training data If the prompt is unique, it is not in the training data. True for basically every prompt. So how is this probability calculated?

3 comments

cbovis 115 days ago

The prompt is unique but the tokens aren't.

Type "owejdpowejdojweodmwepiodnoiwendoinw welidn owindoiwendo nwoeidnweoind oiwnedoin" into ChatGPT and the response is "The text you sent appears to be random or corrupted and doesn’t form a clear question." because the prompt doesnt correlate to training data.

newswasboring 114 days ago

> The prompt is unique but the tokens aren't.

The tokens aren't unique, but the sequence is. Every input this model sees in unique. Even tokens are not as simple as they seem

If you type "ejst os th xspitsl of fermaby?" in ChatGPT it responds with

> It looks like you typed “ejst os th xspitsl of fermaby?”, which seems like a garbled version of:

> "What is the capital of Germany?”

> The capital of Germany is Berlin.

> If you meant to ask something else, feel free to clarify!"

edit: formatting

ajam1507 113 days ago

The prompt does correlate to its training data. In this case, since you sent random text, it generated the most likely response to random text.

HDThoreaun 114 days ago

Or because the text you send was random and doesnt form a clear quesiton?

hmmmmmmmmmmmmmm 115 days ago

...? what is the response supposed to be here?

hmmmmmmmmmmmmmm 115 days ago

Hamiltonian paths and previous work by Donald Knuth is more than likely in the training data.

red75prime 115 days ago

The specific sequence of tokens that comprise the Knuth's problem with an answer to it is not in the training data. A naive probability distribution based on counting token sequences that are present in the training data would assign 0 probability to it. The trained network represents extremely non-naive approach to estimating the ground-truth distribution (the distribution that corresponds to what a human brain might have produced).

qsera 114 days ago

>the distribution that corresponds to what a human brain might have produced..

But the human brain (or any other intelligent brain) does not work by generating probability distribution of the next word. Even beings that does not have a language can think and act intelligent.

hmmmmmmmmmmmmmm 114 days ago

You are always making predictions based on the context. That's why illusions can be so effective like these ones: https://illusionoftheyear.com/cat/top-10-finalists/2024/

astrange 114 days ago

LLMs also don't work by generating probability distributions of the next word. Your explanation isn't able to explain why they can generate words, let alone sentences.

qsera 114 days ago

That is exactly how they work.

astrange 114 days ago

No, a token is not a word.

red75prime 114 days ago

[Citation needed] Neuroscience isn't yet at a point when it can say this with any certainty.

Anyway. It's not a theorem that you can be intelligent only if you fully imitate biological processes. Like flight can be achieved not only by the flapping wings.

qsera 114 days ago

>you can be intelligent only if you fully imitate biological processes

It is not that. It is about having an understanding of how it is trained. For example, if it was trained on ideas, instead of words, then it would be closer to intelligent behavior.

Someone will say that during training it builds ideas and concepts, but that is just a name that we give for the internal representation that results from training and is not actual ideas and concepts. When it learns about the word "car", it does not actually understand it as a concept, but just as a word and how it can relate to other words. This enables it to generate words that include "car" that are consistent, projecting an appearance of intelligence.

It is hard to propose a test for this, because it will become the next target for the AI companies to optimize for, and maybe the next model will pass it.

red75prime 114 days ago

The latest models are mostly LMMs (large multimodal models). If a model builds an internal representation that integrates all the modalities we are dealing with (robotics even provides tactile inputs), it becomes harder and harder to imagine why those representations should be qualitatively different.

hmmmmmmmmmmmmmm 113 days ago

Obviously there is some level of memorisation involved. That's why you can even get LLMs to write parts of Harry Potter from scratch with perfect precision.

qsera 115 days ago

Just using a scaled up and cleverly tweaked version of linear regression analysis...

red75prime 115 days ago

That is, the probability distribution that the network should learn is defined by which probability distribution the network has learned. Brilliant!