|
|
|
|
|
by HarHarVeryFunny
515 days ago
|
|
You're the one out of your depth ... LLMs are taught to predict. Once they've seen enough training samples of words being spelled, they'll have learnt that in a spelling context the tokens comprising the word predict the tokens comprising the spelling. Once they've learnt the letters predicted by each token, they'll be able to do this for any word (i.e. token sequence). Of course, you could just try it for yourself - ask an LLM to break a non-dictionary nonsense word like "asdpotyg" into a letter sequence. |
|
It does away with sub-word tokenization but is still more or less a transformer (no working memory or internal iteration). Mostly, the (performance) gains seem modest (not unanimous, some benchmarks it's a bit worse) ....until you hit anything to do with character level manipulation and it just stomps. 1.1% to 99% on CUTE - Spelling as a particularly egregious example.
I'm not sure what the problem is exactly but clearly something about sub-word tokenization is giving these models a particularly hard time on these sort of tasks.
https://arxiv.org/abs/2412.09871