| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by famouswaffles 654 days ago

>the model will learn what it can to minimize the error over the specific provided loss function, and no more. Change the loss function and you change what the model learns.

You clearly do not really understand what it means to predict internet scale text with increasing accuracy. No more than that ? Fantastic

LLMs do not just learn surface statistics. So many papers have thoroughly disabused this that i'm just not going to bother. This is just straight up denial.

This havs been evidently shown in chess as well. https://arxiv.org/abs/2403.15498v2

You have no idea what you are talkin about. You've probably never even played with 3.5-turbo-instruct. That's how you can say this nonsense. You have your conclusion and keep working backwards to get a justification.

>It's interesting that LLMs can reach the ELO level that they do (says more about chess than about LLMs)

When you say this for everything LLMs can do then it just becomes a meaningless cope statement.

1 comments

HarHarVeryFunny 654 days ago

No of course not - they also learn whatever is necessary, and possible, in order to replicate those surface statistics (e.g. understanding of fairy tales, etc, as I noted).

However, you seem to be engaged in magical thinking and believe these models are learning things beyond their architectural limits. You appear to be star struck by what these models can do, and blind to what one can deduce - and SEE - they they are unable to do.

link

famouswaffles 654 days ago

You've said a lot of things about LLM chess performance that is not true and can be easily shown to be not true. Literally evidence right there that shows the model learning the board state, rules, player skills etc.

And then you've tried to paper over being shown that with a conveniently vague and nonsensical, "says more about bla bla bla". No, you were wrong. Your model about this is wrong. It's that simple.

You start from your conclusions and work your way down from it. "pattern matching technique" ? Please. By all means, explain to all of us what this actually entails in a way we can test for it. Not just vague words.

link

HarHarVeryFunny 654 days ago

An LLM will learn what it CAN (and needs to, to reduce the loss), but not what it CAN'T. How difficult is that to understand?!

Tracking probable board state given a sequence of moves (which don't even need to go all the way back to the start of the game!) is relatively simple to do, and doesn't require hundreds of sequential steps that are beyond the architecture of the model. It's just a matter of incrementally updating the current board state "hypothesis" per each new move (essentially: "a knight just moved to square X, so it must have moved away from some square a knight's move away from X that we believe currently contains a knight").

Ditto for estimating player ELO rating in order to predict appropriately good or bad moves. It's basically just a matter of how often the player makes the same move as other players of a given ELO rating in the training data. No need for hundreds of steps of sequential computation that are beyond the architecture of the model.

Doing an N-ply lookahead to reason about potential moves is a different story, but you want to ignore that and instead throw out a straw man "counter argument" about maintaining board state as if that somehow proves that the LLM can magically apply > N=layers of sequential reasoning to derive moves. Sorry, but this is precisely magical faith-based thinking "it can do X, so it can do Y" without any analysis of what it takes to do X and Y and why one is possible, and the other is not.

link

famouswaffles 654 days ago

>An LLM will learn what it CAN (and needs to to reduce the loss), but not what it CAN'T. How difficult is that to understand?!

Right and the point is that you don't know what it CAN'T learn. You clearly don't quite understand this because you say stuff like this:

>Chess is a good example, since it's easy to understand. The generative process for world class chess (whether human, or for an engine) involves way more DEPTH (cf layers) of computation than the transformer has available to model it

and it's just baffling because:

1. Humans don't play chess anything like chess engines. They literally can't because the brain has iterative computation limits well below that of a computer. Most Grandmasters are only evaluating 5 to 6 moves deep on average.

2. We have a chess transformer playing world class chess (grandmaster level) - https://arxiv.org/abs/2402.04494.

You keep trying to make the point that because a Transformer architecturally has a depth limit for some trained model, a, it cannot reach human level.

But this is nonsensical for various reasons.

- Nobody is stopping you from just increasing N such that every GI problem we care about is covered.

- You have shown literally no evidence that the N even state of the art models posses today is insufficient to match human iterative compute ability.

GPT-4o instant shots arbitrary arithmetic more accurately than any human brain and that's supposedly something it's bad at. You can clearly see it can reach world class chess play.

If you have some iterative computation benchmark that shows transformers zero shotting worse than an unaided human then feel free to show me.

link

HarHarVeryFunny 654 days ago

OK - you win. Today's LLMs are just as good as humans at reasoning.

Why don't you write Sam Altman to tell him the good news ?

Tell him there's nothing stopping him from "increasing N" until the thing get up and walks out the door.

link

famouswaffles 654 days ago

I did not claim the state of the art was better at all forms of reasoning than all humans. I claimed the architecture isn't going to stop it from being so in the future but I guess constructing a strawman is always easier right ?

There are benchmarks that rightfully show the SOTA behind average human performance in other aspects of reasoning so why are you fumbling so much to demonstrate this with unaided iterative computation ? It's your biggest argument so I just thought you'd have something more substantial than "It's limited bro!"

You cannot even demonstrate this today nevermind some hypothetical scaled up model.

I think Sam will be just fine.

link