Hacker News new | ask | show | jobs
by dontupvoteme 1118 days ago
You know that thing i want from your large language model?

I just submitted my query for it in Finnish, Japanese, Russian, Hebrew, German, French, Latin, Farsi, Basque, and English. plus a few dozen more for good measure and to cover the linguistic landscape

Is there any reason to believe watermarking LLMs will hold up in this scenario?

1 comments

It's all numbers underneath. GPT doesn't see different languages, the same data transformations can be applied universally.
I'm dubious. At a bare minimum the 'same' prompt for code translated into other languages produces dramatically different results -- at least it did under codex.

it also thinks it can translate to Sindarin and back, but it just seems to tolkenize everything and also have a vocabulary of about 35 words, most of which are the sun and the moon.

cat in the hat is pretty amazing when translated to it and back though