| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by miket 1271 days ago
	Any question asking about the letters of words is bound to underwhelm because GPT3 is trained on sub-word tokens, so it does not have random access to individual letters. The word "prime" is tokenized as a single token, instead of 5 characters: https://beta.openai.com/tokenizer

2 comments

johnfn 1271 days ago

I’ve seen this said before, but it’s quite clear from the responses that it understands that prime has 5 letters.

link

charcircuit 1271 days ago

Now try a gibberish word. A human could count the number of letters but ChatGPT can only approximate it.

link

umanwizard 1270 days ago

Me: How many letters are in the word “smorgle” ?

GPT: There is no word "smorgle" in the English language.

Me: I never said it was English.

GPT: I apologize for assuming that "smorgle" was a word in the English language. Without any further context or information about the word "smorgle," I am unable to determine how many letters it has or provide any additional information about it.

Me: Count the letters

GPT: If you would like me to count the letters in the word "smorgle," it has 7 letters.

link

SV_BubbleTime 1270 days ago

I’m so trained to double checking anything it says, I quickly confirmed that smorgle does indeed have 7 letters.

link

dwaltrip 1270 days ago

Me: I’ve made up a word called “mlaizapf”. Can you count the number of letters?

ChatGPT: Sure! "Mlaizapf" has 8 letters.

Me: How about this word: “spoofydoofaplixader”

ChatGPT: Spoofydoofaplixader" has 18 letters.

Me: How about the following words: "ghottiaopepinopepotorzey", "zazzamataz", and "maaq"

Gpt: Here are the number of letters in each word:

* "Ghottiaopepinopepotorzey" has 23 letters.

* "Zazzamataz" has 10 letters.

* "Maaq" has 4 letters.

Is there anything else I can help with?

------

It got all of them correct except for "ghottiaopepinopepotorzey" which has 24 letters, not 23.

Very interesting... It seems similar to its math abilities, where it struggles with bigger numbers or more complex problems.

link

2OEH8eoCRo0 1270 days ago

I asked it a bunch of gibberish words and it got them all correct.

link

doubleunplussed 1270 days ago

My mental model is that if you give it real words, it uses approximately one token per word, and it may or may not know how many letters are in the word - it will have learned how many letters there are only if that information was in its training. Just like any other fact it learns about words. It is not counting the letters.

If you give it a gibberish word, it will represent it with one letter per token and be actually able to more or less count tokens in order to figure out how many letters there are.

So this ends up looking like it can count letters in most words, real and fake. Perhaps it would do poorly with real but uncommon words.

link

charcircuit 1270 days ago

>more or less count tokens

Which is what I meant by saying "approximate" because it can "count" the number of tokens.

link

marstall 1271 days ago

> it does not have random access to individual letters

this presumes it works by understanding the components of the question and reasoning based on them. But it doesn't access down to that level, instead just guessing the most likely next word based on statistical tricks. so it doesn't need to "know" about letters to generate a reasonable response involving letters.

link

ImprobableTruth 1270 days ago

What do you think hidden layers do?

link

marstall 1269 days ago

not familiar with that - what is it?

link