Hacker News new | ask | show | jobs
by jbay808 1094 days ago
> what do you think would happen if you gave your model an alphanumeric list to sort? Did you try that?

The tokenizer would throw an exception, because it doesn't have any tokens to represent alphabetical characters. But you tell me - if I had tokenized alphabetical characters and defined an ordering, would you expect the results to be any different?

> You say e.g. that "LLM is learning an n-gram"[...] you can't "learn an n-gram".

Where do I say that? I don't think I make any reference to "learning an n-gram", which is a relief because I don't know what it would mean to "learn an n-gram".

> There's plenty of error in the figure where you show its accuracy (not clear if that's training or test accuracy).

Test accuracy between training iterations (not part of the training process itself, which uses its own separate validation set which is split from the training set). And yes, I agree, it is not error-free, and I wouldn't expect it to be, especially after so little training. What the figure shows is the percentage of sorts that were error-free, and how rapidly that decreases. I've since repeated the test with finer resolution, and the fraction of imperfect sorts continues to decrease about as you expect, which is enough to satisfy my curiosity, although I'm a little curious to see if there is some point where it falls completely to zero.

1 comments

>> Where do I say that?

In your comment above:

(...) is expressed a little bit more clearly as _the LLM is learning an n-gram_ that produces correct sorts (...)

(My underlining)

You also use it in a similarly unusual way throughout your linked substack post, for example, you write:

the way GPT works is, in a certain sense, functionally equivalent to an n-gram, but that doesn’t mean GPT is an n-gram.

Where does this use of "n-gram" come from? I mean, did you see it somewhere? I'm curious, where?

>> The tokenizer would throw an exception, because it doesn't have any tokens to represent alphabetical characters. But you tell me - if I had tokenized alphabetical characters and defined an ordering, would you expect the results to be any different?

I'm sorry, I don't understand. "Defined an ordering", where?

You can change your tokenizer but that will not change the trained model, obviously. So if you take your model that's trained on two-digit lists of integers and you run it on lists of any other type of elements it will not be able to sort them correctly. But isn't that what you claim? That:

"the LLM's training molds it into representing an actual sorting algorithm that would correctly generalize to any input list"

"Any input list"? How so?

> In your comment above

Oh, I see, good catch. I think that comment was a result of a botched edit; I do that sometimes. Too late to change it now. Sorry for the confusion!

> Where does this use of "n-gram" come from? I mean, did you see it somewhere?

It's shorthand for n-gram Markov model. The same way it is presented in, for example, A Mathematical Theory of Communication.

> "Defined an ordering", where?

In order for a set to be sortable, you need to define an ordering over the elements. So for example, defining that the letter 'A' is greater than the number '99'. It's easy to take for granted that 1 < 2, but the neural network doesn't know that a priori, because the tokens are just index values. It doesn't have any way to know that token number 5 represents the character '5'.

> if you take your model that's trained on two-digit lists of integers and you run it on lists of any other type of elements it will not be able to sort them correctly.

To reiterate, the token dictionary basically just contains the characters "0123456789,():[]_\n". If you try to ask it to sort '(Tuesday, Monday)', it's just going to throw an exception because 'T' isn't a recognized token; it doesn't have a corresponding index. It's not even a question of whether it can sort them correctly or incorrectly.

> "Any input list"? How so?

I think the meaning is pretty clear. No algorithm can sort a list of elements that aren't members of a totally ordered set, so I wasn't attempting to imply that any input list meant that a neural network could somehow supersede this limitation.