| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by foobarqux 452 days ago

> "Of course parsing has occurred. Your history with this person (and people in general) and what you know he likes to say, his mood and body language. Still probabilistic."

This is just redefining terms to be so vague as to make rationality inquiry or discussion impossible. I don't know what re-definition of parsing you could be using that would still be in any way useful or to what "probabilistic" in that case is supposed to apply to.

If you are saying that the brain is constantly predicting various things so that it automatically imbues some process that doesn't involve prediction as probabilistic then that is just useless.

> Common sentences are not necessarily structurally simpler and those still get processed faster so yes it's pretty true.

Well, I'll have to take your word for it as you haven't cited the paper but I would point to the reasonable explanation of different processing times that has nothing to do with parsing I gave further below. But I will repeat the vision analogy: If I had an experiment that showed that I took longer to react to an unusual visual sequence we would not immediately conclude that the visual system was probabilistic. The more parsimonious explanation is that the visual system is deterministic and some other part of cognition takes longer (or is recomputed) because of the "surprise".

> So? LLMs can do this. I'm not even sure why you would think probabilistic predictors couldn't.

It's not about capturing it in a statistics or having an LLM produce it, it's about explaining why that rule occurs and not some other. That's the difference between explanation and description.

> That's one way to do it yeah. Why would I 'believe in it' ? Computers that rely on it don't work anywhere near as well as those that don't. What evidence is there to it being anything more than a nice simplification ?

Because producing one token at a time cannot produce arbitrary recursive structures like sentences can be? Because no language uses linear order? Because when we express a thought it usually can't be reduced to a single start word and statistically most-likely next word continuations? It's also irrelevant what computers do, we are talking about what humans do.

> Why does a LLM that encounters a novel form of that sentence generate the question form correctly ?

That isn't the question. The question is why it's that way and not another. It's as if I ask why do the planets move in a certain pattern and you respond with "well why does my deep-neural-net predict it so well?". It's just nonsense.

> You are giving examples that probalistic approaches are clearly handling as if they are examples that probalistic approaches cannot explain. It's bizarre

No probabilistic model has explained anything. You are confusing predicting with explaining.

> I'm not implying it's unreasonable. I'm telling you the brain clearly does not process language this way because even structurally simple but uncommon syntax is processed slower.

I explained why you would expect that to be the case even with deterministic processing.

> What's the point of describing Newton's model as fiction if I still teach it in high schools and Universities? Because erroneous models can still be useful.

Well as I said this is also true of Einstein's theory of gravity and you presumably brought up the point to contrast universal grammar with that theory rather than point out the similarities.

> The brain does not comprehend a sentence without trying to predict its meaning. They aren't orthogonal. They're intrinsically linked

The brain is doing lots of things, we are talking about the language system. Again, if instead we were talking about the visual system no one would dispute that the visual system is doing the "seeing" and other parts of the brain are doing predicting.

In fact they must be orthogonal because once you get to the end of the sentence, where there are no next words to predict, you can still parse it even if all your predictions were wrong. So the main deterministic processing bit (universal grammar) still needs to be explained and the ancillary next-word-prediction "probabilistic" part is not relevant to its explanation.