|
|
|
|
|
by solarmist
1404 days ago
|
|
I think it's conflating doing anything that might help with doing the most valuable things. As a concrete example, if you looked up 25k words once in a dictionary at 10 seconds each (this is speedy for a digital or paper dictionary), it would cost you >70 hours looking things up. You'd be hard-pressed to convince me that getting very good at finding stuff in a reference is directly improving my language skills. The intermediate plateau is because of Zipf's law. In a 300-page book, there are ~5500 unique words and ~3000 of them occur once or twice. This isn't a big deal for native speakers because a 300-page book is about 100k words (1 day's worth of content), but for a language learner, that might take weeks or months to cover. To go further, that native speaker will probably encounter those words again in ~40 days, but it might be years before that learner re-encounters all of them (having long since forgotten them). Your time is best spent focusing on the sentences (30% of the book) that contain those 3000 words because they use almost all of the rest of the words. |
|
This seems to assume: (i) that readers of a 300 page book in a foreign language (not typically a beginner task!) are choosing to do so primarily as a means to the end of learning/remembering unfamiliar words, and not because they want to understand the content of the book itself, develop their appreciation of literary phrasing, challenge themself etc., and (ii) that focusing on a [probably disjointed] subset of the sentences in the book won't deprive the reader of the necessary context to grok sentences even when the words are familiar. I'm not sure either is generally true.
Ultimately the alternative to using machine-selected sentences isolated from long form text for learning new words or fill-the-blanks exercises is using definitions and exercises specifically constructed to be accessible and relevant to language learners. The only obvious case where I can see the ML process generating more useful examples is if the language learners' needs are skewed heavily towards absorbing the sort of specialist technical/professional vocabulary conventional learning courses don't cover.
I also think that picking up common and uncommon idiomatic phrases would be at least as important as individual words too (though this is definitely something an ML tool can aid)