| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by evgen 433 days ago
	This is one of those subtle clues that the LLM does not actually 'know' anything. It is providing you the best consensus answer to your prompt using the data upon which the weights rest, is that data was input primarily as english then you are going to get better results asking in english. It is still Searle's Chinese Room except you need to first go to the 'Language X -> English' room and then deliver its output to the general query room before delivering the next result to the 'English -> Language X' room.

6 comments

jug 433 days ago

Anthropic’s research did find that Claude seemed to have an inner language agnostic ”language” though. And that the larger a LLM got, the more it could realize the innate meaning of words between language barriers as well as expand upon its internal non-specific language representation.

So, part of its improved performance as they grow in parameter count is probably not only due to expanded raw material that it is trained upon, but a greater ability to ultimately ”realize” and connect apparent meanings of words, so that a German speaker might benefit more and more from training material in Korean.

> These results show that features at the beginning and end of models are highly language-specific (consistent with the {de, re}-tokenization hypothesis [31] ), while features in the middle are more language-agnostic. Moreover, we observe that compared to the smaller model, Claude 3.5 Haiku exhibits a higher degree of generalization, and displays an especially notable generalization improvement for language pairs that do not share an alphabet (English-Chinese, French-Chinese).

Source: https://transformer-circuits.pub/2025/attribution-graphs/bio...

However, they do see that Claude 3.5 Haiku seemed to have an English ”default” with more direct connections. It’s possible that a LLM needs to go a more roundabout way via generalizations to communicate in alternative languages and where this causes a dropoff in performance the smaller the model is?

link

ako 432 days ago

Sounds like it is capable of thinking in abstract concepts instead of words that are related/connected? So that training material in different languages would all add to knowledge on the same concepts?

It is like a student in school that is really brilliant in learning by heart, and repeating the words it studied, but not understanding the concept versus a student that actually understands the topic and can reason about the concepts.

link

numpad0 433 days ago

The modern Standard Chinese language is almost syntactically "identical" to English, for some reason. French was direct ancestor to medieval British language that came to be the modern English.

My point is, those language pairs aren't random examples. Chinese isn't something completely foreign and new thing when it comes to difference between it and English.

link

vjerancrnjak 433 days ago

Exactly. I found it surprising how soon it was implied "Imagine you're the smartest and most creative person in the world, ..." would somehow result in the most creative output.

It's clear from the start that language modelling is not yet there. It can't reason about low level structure (letters, syllables, rhyme, rhythm), it can't map all languages to a singular clear representation. Representation is mushy distributed mess out of which you get good or bad results.

It's brilliant how relevant the responses are and when they're correct, but the underlying process is driven by very weird internal representations.

link

sorenjan 433 days ago

It would be great if we could get to a point where we can use a language encoder and decoder, with a language agnostic knowledge model in between. But since it's generally more efficient to train the whole model end to end, such modularity would probably come at a performance price, and I don't see any private (or "non profit") companies take that approach anytime soon.

link

TimPC 433 days ago

My supervising professor for the PhD program I left did a paper on the Chinese Room and argued that to a large degree understanding of the task was the ability to compress it many orders of magnitude. In that sense the LLMs are succeeding because despite their supposively massive parameter sets they are absolutely tiny compared to the Chinese Room version.

link

keeganpoppen 433 days ago

Searle's "Chinese Room" was as wrong then as it is now

link

justlikereddit 433 days ago

Similar or better than the performance of most so called humans so I guess we're all a collection of Chinese room switchboxes.

link