Hacker News new | ask | show | jobs
by pealco 3277 days ago
This doesn't really address your teacher's claim about having to look words up, though. What you want to look at is the distribution of low frequency words across the book. What do the plots look like when you remove proper nouns, functional words (e.g., "the", "and", prepositions) and, say, the top 1000 most frequent words in English?
3 comments

Would be very interesting to see this applied to blogs in different categories to rapidly learn languages through reading based on the words that you currently know and the most frequent words in that language. So it would always present you with the article that suits your level and you would have the benefit of learning the most new words.
Also would be interesting to see it applied to newspapers, with obvious slices like particular author, section (sports v world news etc) distribution year to year, and which paper. TV news broadcasting could also be interesting to compare by the same dimensions, though the conversational style in some interview shows would possibly make this less telling. .
That's something worth trying.
I imagine it would look very much like the plots of unique words given in the article. As you suspect, the chances of coming across one of these is much more evenly distributed.
It probably would look more or less similar. They are excluded very quickly. There is something I cannot asses: How is the word important to understanding the sequence?