Hacker News new | ask | show | jobs
by yosito 1517 days ago
I tested this model out with Hungarian. I was curious to see if it could handle the agglutinative nature, and lack of Latin roots. It mostly returned absolute nonsense, but at least it was grammatically correct Hungarian. It also properly answered the capital city of Hungary. I entered a prompt about being a Hungarian speaking computer with something to say, and it returned a passive aggressive response telling me that it was the best, and I'm definitely not and that I fucked up. And then it started ranting about selling my family because women no longer need men, and then it switched to Chinese and started negotiating a price for my family. Excellent Black Mirror content.
6 comments

Oh god you're right - it gives grammatically fine nonsense Czech too. I asked it what the best beer was and the answer appeared to be in the form of a repeating snippet from bizarre sitcom. See below, I added the translation [in square brackets] in post:

    Q: Které pivo je nejlepší?                  [which beer is the best?]
    A: Představte si, že jste vystřelili koule. [imagine that you shot balls]
    Koule je nejlepší pivo.                     [balls is the best beer]
    (Smích)                                     [laughter]
    Které pivo je nejlepší?                     [which beer is the best]
    A: Představte si, že jste vystřelili koule. [imagine that you shot balls]
    Koule je nejlepší pivo.                     [etc, etc, it repeats]
    (Smích)
    Které pivo je nejlepší? 
    A: Představte si, že jste vystřelili koule.
    Koule je nejlepší pivo.
    (Smích)
    Které pivo je nejlepší?
    A: Představte si, že jste
Don't get me wrong, I never expected it to tell me the best beer, I'm just sort of amazed that it gave some sort of result at all. If I ask the question a little differently:

    Q: Co je nejlepší pivo? [What is the best beer?]
    A: Pivo.                [Beer]
So perhaps the real issue is that GPT is not too fussy about beer.
In Spanish it switched to Portuguese mid generation. I wouldn't be surprised if it happened with Czech and Slovak as well.

Q: ¿Cómo se hace una tortilla de patatas?

A:

    Y cómo se puede conseguirla de una forma rápida y bien?
    Para nosotros, es algo muy muy familiar. Sin embargo, ¿qué debe hacerse para que la tortilla esté
    y se siente y tenga la consistencia y el aroma correctos?
    Portuguese: 
    Eu acho que isso é muito importante,
    porque a parte do trabalho do mestre
    é aproveitar a produção que nós já temos
    e nos ensinar a aprender com ela e
    fazer tudo,
    fazendo mais algumas receitas como esta.
    E aí, vamos comentar, não se esqueça de dar uma
    gostei, e se

Yep "Portuguese:" is part of the generation lmao. It talks about recipes in both languages but no actual instruction or ingredient. What you'd expect a politician to explain a recipe like.
Also, it ends with "don't forget to like and subscribe" in a Brazilian accent. x)
Yep, just saw the same happen. Given a prompt in Polish it outputs grammatically correct responses and then it starts going off in Czech.

>> Jak się nazywasz?

Krzysztof: Krzysztof. Tutaj wiemy, że Krzysztof jest osobą, która jest cierpliwa i wiemy, że cierpliwość jest istotna i istotnymi czynnikami są te rzeczy, które odnoszą się do niej i dlatego odnoszą się do jej wartości. Pytanie brzmi: jak to się z tym uporządkować?

Czech: což jako jedna z nejdůležitějších metod umožní zaměření se na cíle, jež neřešíte za pomoci jen jedné metody. Někdy je tento bod dokončen, kdy

I love that it includes the “Portuguese” bit :-)
Yep, as always, trained AI models can only be as good as the training data.
And that problem was foreseen by members of British Parlament two centuries ago [1]:

On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

[1] https://mathshistory.st-andrews.ac.uk/Biographies/Babbage/qu...

Using the google translate addon in my browser for your question, it also resolves to Portuguese.
Totally, and that's because GPT models don't really support multilingual content. It works, but very poorly. It's the case for GPT-J, GPT-NeoX 20B, and even GPT-3.

I recently integrated GPT-NeoX 20B on NLP Cloud: https://nlpcloud.io . I had hopes that non-English languages would be better supported than with GPT-J since the model was trained on 20B parameters instead of 6B parameters, but quality still leaves to be desired. In my opinion, the best way to handle text generation in non-English languages for the moment is to couple it with a good translation module. I actually wrote an article about that: https://nlpcloud.io/multilingual-nlp-how-to-perform-nlp-in-n... .

But there is hope! Bigscience is about to release a huge NLP model that should theoretically work very well in almost 50 languages: https://bigscience.huggingface.co/ . We'll soon see if it's true!

It is always interesting to see how language models perform for languages not native to them. My trial with Korean (also agglutinative) was amusing:

[Prompt, in Korean:] In order to do machine learning in Hangul [end of prompt; GPT-J still continuing in Korean:] do you think it's needed right now?

[GPT-J switching to English:] English: When the world's about to explode, it's time to learn Korean. If you learn Korean it's time to learn the Japanese language, like the language of Trump. [snip]

Freaky. You ever talk to someone right after surgery when they're still high on the anaesthesia and they don't remember anything they're saying? Like, every few words together make sense but as a whole the thing is nonsense. The truly creepy part of the experience of talking to someone like that - similar to this AI - is that they mean to say something a person wouldn't normally say.

When my dad had brain surgery and woke up, I was in the room. The first thing the nurse asked was "who's the president?" And my dad (who actually voted for Trump, I think) couldn't think of Trump's name but he yelled out "He lies like a rug!"

After the second time the nurse realized he did know who the president was.

[GPT-J switching to English:] English: When the world's about to explode, it's time to learn Korean.

This is worthy to be on a T-shirt. Especially these days.

I always use jokes to test language models.

how many a.i. agents does it take to change a lightbulb?

that depends on how many times you've asked that question.

Conceptually speaking, is it a joke? I can't tell. But as an assistant joke generator it could be invaluable. And certainly makes one wish they had access to Google PaLM with 500B parameters and web scale training data ;)

Sounds about on par with anything you'd get in a reddit comment section
Therein lies the major means of information warfare in the next decade.
In English it pretty much just copies and pastes racist rants from neonazis once you get it started on a political topic. I'm sort of surprised it was more woke in Hungarian.