| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by SlinkyOnStairs 79 days ago

> It's just a token predictor what do you expect?

The point isn't that it's unexpected. It's that prior text-to-speech systems were much better about this particular failure mode, prone to spitting out entirely incorrect words but not rephrasing entire sentences.

This is a particularly bad failure mode because people don't notice it.

> What we need are tools that embrace that and ping the agent to validate what it just said or double check.

This is not a problem that can be fixed by throwing more AI at it. It's a shared problem to all such systems, whether they're audio-text transformers or LLMs. Agentic review would just further push the system towards creating output that looks correct, but is not.

LLM translation does the same, yielding more natural text, but generally not better translation. In several cases, especially the "easy" translation between similar languages (e.g. within a language group like Germanic or Nordic) LLM-powered translation is notably worse than more primitive "word & phrase book" systems, tending to change the meaning of the text in order to have good grammar whereas these older systems would give crude or grammatically incorrect translations that still retained the core meaning.

2 comments

Semaphor 79 days ago

I often (ish) translate between English and German, two languages I speak very well. The quality of translation is amazing and far better than what old systems did.

Maybe it depends on topics or length, for me it's usually 1-2 paragraphs of a German article to share online.

link

netdevphoenix 78 days ago

> The quality of translation is amazing and far better than what old systems did.

Are you native in both languages? If you are only native in one of them, it would be insightful to find if people with your skillset but native in the language you are not have the same opinion as you.

link

Semaphor 77 days ago

It’s rather unlikely that the translation in one direction is great, but lacking in the other, while also being just good enough (compared to before) that my close-to-native English skill misses it, while the old google translate somehow magically made me think it was bad.

Sadly there are no examples here to compare.

link

SlinkyOnStairs 79 days ago

> Maybe it depends on topics or length, for me it's usually 1-2 paragraphs of a German article to share online.

Same languages, same use case. My experience is different. On both google translate and others. ¯\_(ツ)_/¯

link

Semaphor 78 days ago

Haven’t used google translate in a long time, mostly because of quality issues before LLMs. Deepl was leading for a while, nowadays I’m very happy with Kagi translate.

link

jacobr1 79 days ago

Older ML systems were much better at exposing their internal confidence. Plenty of papers reverse out this kind of interpretability for open weight models. All the models exposed logprobs early on. This seems solvable if prioritized. The unintelligible words should be lower confidence. Getting per-token data for the output that aids with understanding the predictions is entirely feasible as engineering effort - it just won't be enough to address all the problems - but it should help quite a bit.

link