Hacker News new | ask | show | jobs
Generating Cognateful Sentences with LLMs (vkethana.com)
5 points by vkethana 532 days ago
4 comments

This is really neat, from both a technical and a language learning perspective.

I quite like the idea of eliminating English from the UI for immersion. You could also go with French + icons, so there's a little learning there too.

> Is there a cheaper, more scalable way to score sentences than what I’ve described here?

I estimate English sentence difficulty for my program, and that consists of some grammar heuristics + seeing which CEFR level each word belongs to and (~roughly, it's a little more complex) averaging their difficulties.

I wonder if it'd be worth trying a similar technique to estimate the difficulty/cognate-score.

Maybe some process like:

- get the words

- get their English translation (easier said than done...)

- estimate how close they are to a cognate: perhaps a little classical ML here?

Alternatively, you could exploit etymology. If the English translation has either French or Latin etymological roots, they're probably cognates.

Or (as I like to do with my project), do all of it and average them together!

You might also want to try Claude 3.5 for scoring - it's about as expensive as 4o, and might be more effective. Amazon will also give you free credit via Bedrock and AWS Activate.

> Users should be able to understand how the app works without reading an entire blog post about it.

I have precisely this problem. I think I've distilled my tool (https://nuenki.app) down to something simple, and the HN audience seems to think so, but then I show it to non-technical people and it all goes wrong. Maybe look into how Lingua Latina per se Illustrata is presented?

Maybe try streaks for gamification? They seem to work well in Anki and Duolingo.

Nuenki looks pretty cool! Thanks for sharing this. I will definitely check out Claude 3.5 for scoring. Quick question: Your site mentions using Claude 3.5-Sonnet for translating sentences. On average, how much inference cost does each user typically incur?
I built an app that teaches French by generating "cognateful" sentences: French text that English speakers can partially understand through cognate words, with difficulty automatically calibrated using LLMs to score sentence comprehensibility.

The project shows how AI can automate the creation of comprehensible input for language learning, replacing the traditional approach where native speakers manually craft thousands of carefully sequenced sentences.

Interesting, but practical? We have this great new technology to translate into local or even personal dialects. What's the use in settling for cognateful rather than native? It's like using a car to flatten the snow so that you can walk behind it more easily, rather than riding in the car. Maybe it could be useful in creating bilingual signage where there isn't space for both languages, and understanding is optional.
I think the main practical benefit of cognateful sentences is that they can speed up the process of learning a new language
"Cognateful" isn't a word. Please don't make it one. I'm not even sure what it should mean.