Hacker News new | ask | show | jobs
Show HN: Turn native language audio into flashcards and shadowing practice (lingochunk.com)
24 points by alder 3 hours ago
Here is a tool I built initially for myself to help with my German and Greek language studies. It started as a hack for creating Anki cards from native language audio. It extracts the words, finds their base forms (lemmas) and groups the examples by the lemma. At some point I realised that I have a transcription with word level timestamps that opens a lot of other opportunities. So I added a mode to click the first and last word in the transcript and it starts looping with the right gap and repeat count.

Another feature I use a lot is selecting an audio fragment, sending a predefined prompt to an AI to "explain grammar" or "explain nuances of meaning" and I still experimenting with prompts.

And because shadowing is so easy I also use it as a player to improve my English pronunciation. (I am not a native English speaker.)

I made a quick video showing the workflow for creating Anki cards and shadowing: https://youtu.be/TaR58uuDBvU?si=o5aGLAi2S-BZ7Zy9

The app supports 15 input languages (Japanese and Chinese are the latest experimental additions), and more than 30 output languages.

I would really appreciate it if you could try it https://lingochunk.com/try. I know there are other tools with similar functionality but I created something that fits my workflow and it is fun to build.

Also I struggled to find public domain audio for the try page. I'd be grateful if anyone could point me to public domain sources (I used LibriVox, Wikimedia and FSI courses), or if you're a creator, let me feature some of your own recordings with credits and links.

7 comments

Very cool! I'm also learning Greek and it's amazing how many resources are becoming available.
I don't know what resolution or display you built this on, but a heads up the initial impression on my 4K monitor is that everything is incredibly tiny.
To be honest I haven't tested it on a 4K monitor yet, so I am not surprised. There are two controls above the transcript that change the font size and the line spacing, which should help a bit for now. Something to fix, thanks!
Very nice work. I'm going for a different thing, but my audio2anki tool [1] is about as streamlined as I could make it to turn a YouTube URL I want to learn into a stack of Anki flashcards, purely locally.

[1]: https://github.com/hiAndrewQuinn/audio2anki

What are you doing for Chinese word segmentation/pinyin?
Just tried it with an unsupported language and it still worked I set it to Chinese and inputted the audio. Still got correct results.
Is it possible to add traditional characters for mandarin?

Also the pinyin for 誰/谁 is coming through as shuí, whilst this character has two pronounciations, I believe shéi is the more common one.

Thanks! Chinese and Japanese as source languages are still experimental, I did my best to support them but I have to rely on people who actually know the language and this kind of feedback is really useful. I'll look into adding traditional characters and fixing the pinyin.
This is awesome! I’ll be lurking for new data sources. I’m working on a self-hosted language app more focused around cloze and sentence mining into Anki. I love seeing more stuff happening in this space
Thanks! I am glad you like it! I essentially mine the source audio, and all examples have cloze style gaps (blurring, in my case) that are revealed on the back of the card. I also beep the word in the sentence when you try to play it on the front card in built-in SRS system. Unfortunately that is not implemented in the Anki export, but it is technically possible.