Hacker News new | ask | show | jobs
by tkgally 519 days ago
What models have you been using for that? While I haven’t tried automating the production of vocabulary lists through an API, within the last few weeks I have had the chat versions of ChatGPT 4o, Claude Sonnet 3.5, and one of the latest Gemini models produce annotated vocabulary lists based on literary texts in English, Russian, and Latin. I didn’t spot any hallucinations.

I was asking only for the meanings of the words and phrases, though. I didn’t ask for things like pronunciations, grammatical categories, etc. In the past, when I’ve tried to get that kind of granular information from LLMs, there were indeed errors, presumably because of tokenization issues.

A few days ago, I ran some similar tests with Japanese, asking for readings of kanji and jukugo in an extended text. All of the models I had tried before for such tasks had screwed up. This time, however, ChatGPT o1 scored 100%. It also was able to analyze sentence grammar accurately, unlike the other models I tried. I was impressed.

At current API prices, though, o1 might be a bit too expensive for such a task.

1 comments

I wonder if there are any benchmarks specifically designed to evaluate LLMs' performance in language learning tasks
I haven’t heard of any. It would great if there were....