| HN Mirror

I've tried it between English and Dutch (which is my native language). It's pretty fluent, makes less grammar mistakes than google translate and seems to generally get the gist of the meaning across. It's not a pure syntactical translation. Which is why it can work even between some really obscure language pairs. Or indeed programming languages. Where it goes wrong is when it misunderstands context. It's not an AGI and may not pick up on all the subtleties. But it's generally pretty good.

I ran the abstract of this article through chat gpt. Flawless translation as far as I can see. To be fair, Google translate also did a decent job. Here's the chat GPT translation.

Veel NLP-toepassingen vereisen handmatige gegevensannotaties voor verschillende taken, met name om classificatoren te trainen of de prestaties van ongesuperviseerde modellen te evalueren. Afhankelijk van de omvang en complexiteit van de taken kunnen deze worden uitgevoerd door crowd-werkers op platforms zoals MTurk, evenals getrainde annotatoren, zoals onderzoeksassistenten. Met behulp van een steekproef van 2.382 tweets laten we zien dat ChatGPT beter presteert dan crowd-werkers voor verschillende annotatietaken, waaronder relevantie, standpunt, onderwerpen en frames detectie. Specifiek is de zero-shot nauwkeurigheid van ChatGPT hoger dan die van crowd-werkers voor vier van de vijf taken, terwijl de intercoder overeenkomst van ChatGPT hoger is dan die van zowel crowd-werkers als getrainde annotatoren voor alle taken. Bovendien is de per-annotatiekosten van ChatGPT minder dan $0.003, ongeveer twintig keer goedkoper dan MTurk. Deze resultaten tonen het potentieel van grote taalmodellen om de efficiëntie van tekstclassificatie drastisch te verhogen.

Translating the Dutch back to English using Google translate (to rule out model bias) you get something that is very close to the original that is still correct:

Many NLP applications require manual data annotations for various tasks, especially to train classifiers or evaluate the performance of unsupervised models. Depending on the size and complexity of the tasks, these can be performed by crowd workers on platforms such as MTurk, as well as trained annotators, such as research assistants. Using a sample of 2,382 tweets, we show that ChatGPT outperforms crowd workers for several annotation tasks, including relevance, point of view, topics, and frames detection. Specifically, ChatGPT's zero-shot accuracy is higher than crowd workers for four of the five tasks, while ChatGPT's intercoder agreement is higher than both crowd workers and trained annotators for all tasks. In addition, ChatGPT's per-annotation cost is less than $0.003, about twenty times cheaper than MTurk. These results show the potential of large language models to dramatically increase the efficiency of text classification.

I'm sure there are edge cases where you can argue the merits of some of the translations but it's generally pretty good and usable.