Hacker News new | ask | show | jobs
by evantbyrne 840 days ago
It seems like it is getting tripped up on grammar. Do these models not deterministically preparse text input into a logical notation?
4 comments

There's no preprocessing being done. This is pure computation, from the tokens to the outputs.

I was quite amazed that during 2014-2016, what was being done with dependency parsers, part-of-speech taggers, named entity recognizers, with very sophisticated methods (graphical models, regret minimizing policy learners, etc.) became fully obsolete for natural language processing. There was this period of sprinkling some hidden-markov-model/conditional-random-field on top of neural networks but even that disappeared very quickly.

There's no language modeling. Pure gradient descent into language comprehension.

I don’t think all of those tools have become obsolete. NER, for example, can be performed way more efficiently with spaCy than prompting a GPT-style model, and without hallucination.
There was this assumption that for high level tasks you’ll need all of the low level preprocessing and that’s not the case.

For example, machine translation attempts were morphing the parse trees , document summarization was pruning the grammar trees etc.

I don’t know what your high level task is, but if it’s just collecting names then I can see how a specialized system works well. Although, the underlying model for this can also be a NN, having something like HMM or CRF turned out to be unnecessary.

Oh, right. If the high-level task is to generate a translation or summary, I think that’s been swallowed up by the Bitter Lesson (though isn’t it an open question if decoder-only models are the best fit? I’d like to see a T5 with the scale and pretraining that newer models have had).

On the other hand, people seem to be using GPT-4 for simple text classification and entity extraction tasks that even a small BERT could do well at a fraction of the cost.

I agree it's neat on a technical level. However, as I'm sure the people making these models are well-aware, this is a pretty significant design limitation for matters where correctness is not a matter of opinion. Do you foresee the pendulum swinging back in the other direction once again to address correctness issues?
There is a very long-running joke in AI, going back to 1970s (or maybe even earlier?) that goes something like, "quality of results is inversely proportional to the number of linguists working on the project".

It seems that every time we try it, we find out that when model picks up the language structure on its own, it ends up being better at it than if we try to use our own understanding of language as a basis. Which does seem to imply that our own understanding is still rather limited and is not a very accurate model.

On the other hand, the fact that models get amazing translation capabilities just from training on different languages (seriously, if you are doing any kind of automated translation, do yourself a favor and try GPT-4) implies that there is a "there" there and the Universal Grammar people are probably correct. We just haven't figured out the specifics. Perhaps we will by doing "brain surgery" on those models, eventually.

The "other direction" was abandoned because it doesn't work well. Grammar isn't how language works, it's just useful fiction. There's plenty of language modelling in the weights of the trained model and that's much more robust than anything humans could cook up.
> Me: Be developer reading software documentation.

> itdoesntwork.jpg

Grammar isn't how language works, it's just useful fiction.

No* they are text continuations.

Given a string of text, what's the most likely text to come next.

You /could/ rewrite input text to be more logical, but what you'd actually want to do is rewrite input text to be the text most likely to come immediately before a right answer if the right answer were in print.

* Unless you mean inside the model itself. For that, we're still learning what they're doing.

No - that’s the beauty of it. The “computing stack” as taught in Computer Organization courses since time immemorial just got a new layer, imo: prose. The whole utility of these models is that they operate in the same fuzzy, contradictory, perspective-dependent epistemic space that humans do.

Phrasing it like that, it sounds like the stack has become analog -> digital -> analog, in a way…

No, they're a "next character" predictor - like a really fancy version of the auto-complete on your phone - and when you feed it in a bunch of characters (eg. a prompt), you're basically pre-selecting a chunk of the prediction. So to get multiple characters out, you literally loop through this process one character at a time.

I think this is a perfect example of why these things are confusing for people. People assume there's some level of "intelligence" in them, but they're just extremely advanced "forecasting" tools.

That said, newer models get some smarts where they can output "hidden" python code which will get run, and the result will get injecting into the response (eg. for graphs, math, web lookups, etc).

How do you know you’re not an extremely advanced forecasting tool?
If you're trying to claim that humans are just advanced LLMs, then say it and justify it. Edgy quips are a cop out and not a respectful way to participate in technical discussions.
I am definitely not making this claim. I was replying to this:

> People assume there's some level of "intelligence" in them, but they're just extremely advanced "forecasting" tools.

My question wasn't meant as a quip. Rather it was literal-- how do you know your intelligence capabilities aren't "just extremely advanced forecasting"? We don't know for sure, and the answer is far from obvious. That doesn't mean humans are advanced LLMs-- we feel emotions, for instance. My comment was restricted to intelligence specifically.

You can make a human do the same task as an LLM: given what you've received (or written) so far, output one character. You would be totally capable of intelligent communication like this (it's pretty much how I'm talking to you now), so just the method of generating characters isn't proof of whether you're intelligent or not, and it doesn't invalidate LLMs either.

This "LLMs are just fancy autocomplete so they're not intelligent" is just as bad an argument as saying "LLMs communicate with text instead of making noises by flapping their tongues so they're not intelligent". Sufficiently advanced autocomplete is indistinguishable from intelligence.

The question isn't whether LLMs can simulate human intelligence, I think that is well-established. Many aspects of human nature are a mystery, but a technology that by design produces random outputs based on a seed number does not meet the criteria of human intelligence.
Why? People also produce somewhat random outputs, so?