Hacker News new | ask | show | jobs
by ashkankiani 666 days ago
It's pretty cool that I can read "anablibg" and know that means "enabling." The brain is pretty neat. I wonder if LLMs would get it too. They probably would.
4 comments

Question I wrote:

> I encountered the typo "anablibg" in the sentence "I wonder how much help they had by asahi doing a lot of the kernel and ecosystem work anablibg 16k pages." What did they actually mean?

GPT-4o and Sonnet 3.5 understood it perfectly. This isn't really a problem for the large models.

For local small models:

* Gemma2 9b did not get it and thought it meant "analyzing".

* Codestral (22b) did not it get it and thought it meant "allocating".

* Phi3 Mini failed spectacularly.

* Phi3 14b and Qwen2 did not get it and thought it was "annotating".

* Mistral-nemo thought it was a portmanteau "anabling" as a combination of "an" and "enabling". Partial credit for being close and some creativity?

* Llama3.1 got it perfectly.

I wonder if they'd do better if there was the context that it's in a thread titled "Adding 16 kb page size to Android"? The "analyzing" interpretation is plausible if you don't know what 16k pages, kernels, Asahi, etc are.
Seems like there is a bit of a roll of the dice there. The ones that got it right may have just been lucky.
Ran it a few times in new sessions, 0 failures so far.
I wonder how much of a test this is for the LLM vs whatever tokenizer/preprocessing they're doing.
Is there any task Gemma is better at compared to others?
Local LLM topics are a treadmill of “what’s best and what is preferred” changing basically weekly to monthly, it’s a rapidly evolving field, but right now I actually tend to gravitate to Gemma2 9b for coding assistance for Typescript work or general question and answer stuff. Its embedded knowledge and speed on the computers that I have (32GB M2 Max, 16GB M1 Air, 4080 gaming desktop) make for a good balance while also using the computer for other stuff, bigger models limit what else I can run simultaneously and are slower than my reading speed, smaller models have less utility and the speed increase is pointless if they’re dumb.
fwiw I failed to figure it out as a human, I had to check the replies.
I asked chatgpt and it did get it.

Personally, when I read the comment my brain kinda skipped over the word since it contained the part "lib" I assumed it was some obscure library that I didn't care about. It doesn't fit grammatically but I didn't give it enough thought to notice.

Until I read your comment I didn't even notice...
LLMs are at a great disadvantage here because they operate on tokens, not letters.
I remember reading somewhere that LLMs are actually fantastic at reading heavily mistyped sentences! Mistyped to a level where humans actually struggle.

(I will update this comment if I find a source)

Tihs probably refers to comon mispelllings an typo's.
It's actually not. You can scramble every letter within words and it can mostly unscramble it. Keep the first letter and it recovers almost 100%.