|
|
|
|
|
by YeGoblynQueenne
1095 days ago
|
|
I'm curious, why are you using "n-gram" as if you're referring to a model? You say e.g. that "LLM is learning an n-gram". N-grams are features, not models. You can train an n-gram model, or you can train a language model using n-grams as features, and so on, but you can't "learn an n-gram". Where did you find this terminology? EDIT: >> and Hyp2 is that the LLM's training molds it into representing an actual sorting algorithm that would correctly generalize to any input list. Btw, you have not shown anything like that. You trained and tested on lists of two-digit positive
integers expressible in 128 characters. That's not "any input list". As a for instance, what do you think would happen if you gave your model an alphanumeric list to sort? Did you try that? Your model also doesn't correctly generalise, not even to its own training set that you tested it on. There's plenty of error in the figure where you show its accuracy (not clear if that's training or test accuracy). It's not clear to me how you account for those obvious limitations of your model (it's a toy model after all) when you claim that it "learned to implement a sorting algorithm" etc. It would be great if you could clarify that. |
|
The tokenizer would throw an exception, because it doesn't have any tokens to represent alphabetical characters. But you tell me - if I had tokenized alphabetical characters and defined an ordering, would you expect the results to be any different?
> You say e.g. that "LLM is learning an n-gram"[...] you can't "learn an n-gram".
Where do I say that? I don't think I make any reference to "learning an n-gram", which is a relief because I don't know what it would mean to "learn an n-gram".
> There's plenty of error in the figure where you show its accuracy (not clear if that's training or test accuracy).
Test accuracy between training iterations (not part of the training process itself, which uses its own separate validation set which is split from the training set). And yes, I agree, it is not error-free, and I wouldn't expect it to be, especially after so little training. What the figure shows is the percentage of sorts that were error-free, and how rapidly that decreases. I've since repeated the test with finer resolution, and the fraction of imperfect sorts continues to decrease about as you expect, which is enough to satisfy my curiosity, although I'm a little curious to see if there is some point where it falls completely to zero.