Hacker News new | ask | show | jobs
by coliveira 25 days ago
The issue is that French, Italian, African, Japanese people shouldn't have the inconvenience of instructing the LLM tool to get the basic facts about their own culture. They should use an LLM that has already been trained like that by default. Nobody has obligation to use a tool that thinks it is talking to an American. If I go to Google for example I want to get facts about my own country in my own language.
3 comments

Wouldn't those people be asking the questions in their own language in the first place? The model will reply in the language you use. This thread is about people asking for information about a language that is not the one they are messaging the LLM in
Even if the model will reply in my language, I often notice it searching in english. Or thinking in english. There's always something lost in translation. Sometimes it's just minor nuances. Other times it mangles the legal facts with those of other countries.
This sounds like the problem of people calling "911" as the emergency number which they see in so much US-American media but which is not the emergency number in their own country.
I remember being bored as a teenager on a family holiday to New Zealand in the 1990s, so I went and dialled 911 from a payphone to see what would happen-I got a recorded message saying that in New Zealand, the emergency number isn’t 911, it is 111. Dialling 000 (the Australian emergency number) produced a similar recorded message.
In a lot of countries, they redirect the number or put a voice message to the correct emergency number
They always sound like an obnoxious American tourist talking through a translator, the chatbot training dataset is the same and foundation models are always built with >50% American English data for some reason.
>Nobody has obligation to use a tool that thinks it is talking to an American

Very very emphatic agree from my end, thanks.

> Nobody has obligation to use a tool that thinks it is talking to an American.

Then add top-level instructions saying what country you're from, what country you live in now, and which language you speak. This isn't that hard.

None of that even addresses the problem described, because none of the languages you mentioned would be French in the described example.