Hacker News new | ask | show | jobs
by dartos 488 days ago
I don’t really understand what you’re testing for?

Language, as a problem, doesn’t have a discrete solution like the question of whether a list is sorted or not.

Seems weird to compare one to the other, unless I’m misunderstanding something.

What’s more, the entire notion of a sorted list was provided to the LLM by how you organized your training data.

I don’t know the details of your experiment, but did you note whether the lists were sorted ascended or descended?

Did you compare which kind of sorting was most common in the output and in the training set?

Your bias might have snuck in without you knowing.

3 comments

> I don’t really understand what you’re testing for?

For this hypothesis: The intelligence illusion is in the mind of the user and not in the LLM itself.

And yes, the notion was provided by the training data. It indeed had to learn that notion from the data, rather than parrot memorized lists or excerpts from the training set, because the problem space is too vast and the training set too small to brute force it.

The output lists were sorted in ascending order, the same way that I generated them for the training data. The sortedness is directly verifiable without me reading between the lines to infer something that isn't really there.

A large number of commenters are under the illusion that LLMs are "just" stochastic parrots and can't generalise to inputs not seen in their training data. He was proving that that isn't the case.
Not saying I disagree with the thesis, but I don’t think this proves anything.

If every pair of digits appears sorted in the dataset, then that could still be “just” a stochastic parrot.

I’m kind of interested to see if an LLM can sort when the dataset specifically omits comparisons between certain pairs of numbers.

Also I don’t think OC was responding to commenters, but the article

It might seem like you could sort with just pairwise correlations, but on closer analysis, you cannot. Generating the next correct token requires correctly weighing the entire context window.
Of course, that’s how attention works, after all.

But by specifically avoiding certain cases, wet could verify if the model is generalizing or not.

I mean that needing to scan the full context of tokens before the nth is inherent to the problem of sorting. Transformers do scan that input, which is good; it's not surprising that they're up to the task. But pairwise numeral correlations will not do the job.

As for avoiding certain cases, that could be done to some extent. But remember that the untrained transformer has no preconception of numbers or ordering (it doesn't use the hardware ALU or integer data type) so there has to be enough data in the training set to learn 0<1<2<3<4<5<6, etc.

> there has to be enough data in the training set to learn 0<1<2<3<4<5<6

This is the kind of thing I’d want it to generalize.

If I avoid having 2 and 6 in the same unsorted list in the training set, will sets containing those numbers be correctly sorted in the same list in the test set and at the same rate as other lists.

My intuition is that, yes, it would. But it’d be nice to see and would be a clear demonstration of the ability to generalize at all.

Commenter is merely saying that LLMs indeed are able to approximate arbitrary functions exemplified through sorting.

It is nothing new and has been well established in the literature since the 90s.

The shared article really is not worth the read and mostly uncovers an author who does not know what he write about.

You’re talking specifically about perceptrons and feed forward neural networks.

LLMs didn’t exist in then. Attention only came out in 2017…

Yes? Are you saying that attention is less expressive?
I’m saying that LLMs (models trained on language specifically) are not automatically capable of the same generic function solving.

The network itself can be trained to solve most functions (or all, I forget precisely if NNs can solve all functions)

But the language model is not necessarily capable of solving all functions, because it was already trained on language.