| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ashkankiani 666 days ago
	It's pretty cool that I can read "anablibg" and know that means "enabling." The brain is pretty neat. I wonder if LLMs would get it too. They probably would.

4 comments

evilduck 666 days ago

Question I wrote:

> I encountered the typo "anablibg" in the sentence "I wonder how much help they had by asahi doing a lot of the kernel and ecosystem work anablibg 16k pages." What did they actually mean?

GPT-4o and Sonnet 3.5 understood it perfectly. This isn't really a problem for the large models.

For local small models:

* Gemma2 9b did not get it and thought it meant "analyzing".

* Codestral (22b) did not it get it and thought it meant "allocating".

* Phi3 Mini failed spectacularly.

* Phi3 14b and Qwen2 did not get it and thought it was "annotating".

* Mistral-nemo thought it was a portmanteau "anabling" as a combination of "an" and "enabling". Partial credit for being close and some creativity?

* Llama3.1 got it perfectly.

link

treyd 666 days ago

I wonder if they'd do better if there was the context that it's in a thread titled "Adding 16 kb page size to Android"? The "analyzing" interpretation is plausible if you don't know what 16k pages, kernels, Asahi, etc are.

link

jandrese 666 days ago

Seems like there is a bit of a roll of the dice there. The ones that got it right may have just been lucky.

link

HeatrayEnjoyer 666 days ago

Ran it a few times in new sessions, 0 failures so far.

link

slaymaker1907 666 days ago

I wonder how much of a test this is for the LLM vs whatever tokenizer/preprocessing they're doing.

link

Alifatisk 666 days ago

Is there any task Gemma is better at compared to others?

link

evilduck 666 days ago

Local LLM topics are a treadmill of “what’s best and what is preferred” changing basically weekly to monthly, it’s a rapidly evolving field, but right now I actually tend to gravitate to Gemma2 9b for coding assistance for Typescript work or general question and answer stuff. Its embedded knowledge and speed on the computers that I have (32GB M2 Max, 16GB M1 Air, 4080 gaming desktop) make for a good balance while also using the computer for other stuff, bigger models limit what else I can run simultaneously and are slower than my reading speed, smaller models have less utility and the speed increase is pointless if they’re dumb.

link

Retr0id 666 days ago

fwiw I failed to figure it out as a human, I had to check the replies.

link

im3w1l 666 days ago

I asked chatgpt and it did get it.

Personally, when I read the comment my brain kinda skipped over the word since it contained the part "lib" I assumed it was some obscure library that I didn't care about. It doesn't fit grammatically but I didn't give it enough thought to notice.

link

mrbuttons454 666 days ago

Until I read your comment I didn't even notice...

link

mrob 666 days ago

LLMs are at a great disadvantage here because they operate on tokens, not letters.

link

platelminto 666 days ago

I remember reading somewhere that LLMs are actually fantastic at reading heavily mistyped sentences! Mistyped to a level where humans actually struggle.

(I will update this comment if I find a source)

link

thanatropism 666 days ago

Tihs probably refers to comon mispelllings an typo's.

link

HeatrayEnjoyer 666 days ago

It's actually not. You can scramble every letter within words and it can mostly unscramble it. Keep the first letter and it recovers almost 100%.

link