Hacker News new | ask | show | jobs
by nonameiguess 1091 days ago
There are already machine-translation services trained and created specifically for that purpose. While it's an amazing realization that LLMs can do this and do it pretty well without having to be trained specifically for this one purpose, training something to do literally all text generation tasks is expensive compared to training something specifically to do language to language translation.

For a maybe more obvious example, say that LLMs ever got good enough to do arbitrary precision arithmetic on numbers up to hundreds of digits. Would that be a good use of one when calculators can already do this and are far cheaper to produce? I guess it makes no difference from a free-tier consumer's perspective, but it's still more expensive even if you aren't personally paying the expense.

4 comments

The quality difference is substantial. I don't care if it's wasteful to use something that has many uses for a supposedly narrow task (although I don't see translation as a particularly narrow task anymore than I see writing as a narrow task). I would gladly waste untold trillions of floating point operations for a 1% increase in translation quality. From my experiments, though, it's much higher than 1% increase in translation quality. And regardless of how wasteful the compute is, it's actually cheaper in terms of dollars. Using GPT-3.5 to translate Korean to English would cost about $11 per million words, based on the average characters per token of the small sample of text I gave it. DeepL (the best translation service I could find) costs $25 per million characters, or for my sample text, about $64 per million words. At $11 per million words I can have GPT-3.5 perform multiple translation passes and use it's own judgment to pick the best translation and STILL save money compared to DeepL.
GPT-4 is a much much better translator than Google Translate and the like. You should absolutely be using GPT for translations especially for distant language pairs that quickly devolve into nonsense with Google Translate, Deepl etc

https://www.reddit.com/r/Korean/comments/13lkh6c/gpt4_is_far...

https://github.com/ogkalu2/Human-parity-on-machine-translati...

LLMs are the machine-translation services created specifically for that purpose[0], it just turned out they're very good at many other things!

Your analogy would be like saying why use a computer to multiply numbers if you can calculate them using calculator, which is much cheaper. Sure, but if you already have a computer, no need to use a dedicated calculator as wel.

[0] https://nlp.seas.harvard.edu/annotated-transformer/#results

The original paper[0] that laid the foundation for modern LLMs was demonstrated on machine translation tasks. It's one of the primary use cases these architectures were designed for. What other types of models do you have in mind that outperform them?

[0] "Attention Is All You Need" https://arxiv.org/pdf/1706.03762.pdf