| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gitaarik 382 days ago
	I think it's not just token support, it's also having a understanding of certain concepts that allows you to arrive at new points like C, D, E, etc. But LLM's don't have an understanding of things, they are statistical models that predict what statistically is most likely following the input that you give it. But that that will always be based on already existing data that is fed into the model. It can produce "new" stuff only by combining the "old" stuff in new ways, but it can't "think" of something entirety conceptionally new, because it doesn't really "think".

1 comments

JoshCole 382 days ago

> it can't "think" of something entirety conceptionally new, because it doesn't really "think".

Hierarchical optimization (fast global + slow local) is a precise, implementable notion of "thinking." Whenever I've seen this pattern implemented, humans, without being told to do so by others in some forced way, seem to converge on the use of verb think to describe the operation. I think you need to blacklist the term think and avoid using it altogether if you want to think clearly about this subject, because you are allowing confusion in your use of language to come between you and understanding the mathematical objects that are under discussion.

> It can produce "new" stuff only by combining the "old" stuff in new ways,

False premise; previously debunked. Here is a refutation for you anyway, but made more extreme. Instead of modeling the language task using a pre-training predictive dataset objective, only train on a provided reward model. Such a setup never technically shows "old" stuff to the AI, because the AI is never shown stuff explicitly. It just always generates new things and then the reward model judges how well it did. Clearly, the fact that it can do generation while knowing nothing, shows that your claim that it can never generate something new -- by definition everything would be new at this point -- is clearly false. Notice that as it continually generates new things and the judgements occur, it will learn concepts.

> But LLM's don't have an understanding of things, they are statistical models that predict what statistically is most likely following the input that you give it.

Try out Jayne's Probability Theory: The Logic Of Science. Within it the various underpinning assumptions that lead to probability theory are shown to be very reasonable and normal and obviously good. Stuff like represent plausibility with real numbers, keep rankings consistent and transitive, reduce to Boolean logic at certainty, and update so you never accept a Dutch-book sure-loss -- which together force the ordinary sum and product rules of probability. Then notice that statistics is in a certain sense just what happens when you apply the rules of probability.

> also having a understanding of certain concepts that allows you to arrive at new points like C, D, E, etc. But LLM's don't have an understanding of things

This is also false. Look into the line of research that tends to go by the name of Circuits. Its been found that models have spaces within their weights that do correspond with concepts. Probably you don't understand what concepts are -- that abstractions and concepts are basically forms of compression that let you treat different things as the same thing -- so a different way to arrive at knowing that this would be true is to consider a dataset with less parameters than there are items in the dataset and notice that the model must successfully compress the dataset in order to complete its objective.

link

gitaarik 382 days ago

Yes ok, it can generate new stuff, but it's dependent on human curated reward models to score the output to make it usable. So it still depends on human thinking, it's own "thinking" is not sufficient. And there won't be a point when human curated reward models are not needed anymore.

LLM's will make a lot of things easier for humans, because most of the thinking the humans do have been automated into the LLM. But ultimately you run into a limit where the human has to take over.

link

JoshCole 382 days ago

> dependent on human curated reward models to score the output to make it usable.

This is a false premise, because there already exist systems, currently deployed, which are not dependent on human-curated reward models.

Refutations of your point include existing systems which generate a reward model based on some learned AI scoring function, allowing self-bootstrapping toward higher and higher levels.

A different refutation of your point is the existing simulation contexts, for example, by R1, in which coding compilation is used as a reward signal; here the reward model comes from a simulator, not a human.

> So it still depends on human thinking

Since your premise was false your corollary does not follow from it.

> And there won't be a point when human curated reward models are not needed anymore.

This is just a repetition of your previously false statement, not a new one. You're probably becoming increasingly overconfident by restating falsehoods in different words, potentially giving the impression you've made a more substantive argument than you really have.

link

gitaarik 382 days ago

So to clarify, it could potentially come up with (something close to) C, but if you want it to get to D, E, F etc, it will become less and less accurate for each consequentive step, because it lacks the human curated reward models up to that point. Only if you create new reward models for C, the output for D will improve, and so on.

link

JoshCole 382 days ago

> Only if you create new reward models for C, the output for D will improve, and so on.

Again, tons of false claims. One is that 'you' have to create the reward model. Another that it has to be human-curated at all. Yet another is that you even need to do that at all: you can instead have the model build a bigger model of itself, train using its existing resources or more of them, then synthesize itself back down. Another way you can get around it is to augment the existing dataset in some way. No other changes except resource usage and yet the resulting model will be better, because more resources went into its construction.

Seriously notice: you keep making false claims again and again and again and again and again. You're not stating true things. You really need to reflect. If almost every sentence you speak on this topic is false, why is it that you think you should be able to persuade me to your views? Why should I believe your views, when you say so many things that are factually inaccurate, rather than my own views?

link

gitaarik 381 days ago

Ok, so you claim that LLMs can get smarter without human validation. So why do they hallucinate at all? And why are all reward models currently curated by humans? Or are you claiming they aren't?

link

JoshCole 381 days ago

I don't find it reasonable that you didn't understand my corrections, because current AI already do. So I'm exiting the conversation.

https://chatgpt.com/share/683a3c88-62a8-8008-92ef-df16ce2e8a...

link

vidarh 382 days ago

> And there won't be a point when human curated reward models are not needed anymore.

This doesn't follow at all. There's no reason why a model can not be made to produce reward models.

link

gitaarik 382 days ago

But reward models are always curated by humans. If you generate a reward model with an LLM, it will contain hallucinations that need to be corrected by humans. But that is what a reward model is for. To correct the hallucinations of LLMs.

So yeah theoretically you could generate reward models with LLMs, but they won't be any good, unless they are curated by other reward models that are ultimately curated by humans.

link

vidarh 382 days ago

> But reward models are always curated by humans.

There is no inherent reason why they need to be.

> So yeah theoretically you could generate reward models with LLMs, but they won't be any good, unless they are curated by other reward models that are ultimately curated by humans.

This reasoning is begging the question: The reasoning is true only if the conclusion is true. It's therefore a logically invalid argument.

There is no inherent reason why this needs to be the case.

link

gitaarik 382 days ago

Sorry but I don't follow your logic. Are you claiming that reward models that aren't curated by humans perform as well as ones that are?

Then what is a reward model's function according to you?

link