| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pealco 3277 days ago
	This doesn't really address your teacher's claim about having to look words up, though. What you want to look at is the distribution of low frequency words across the book. What do the plots look like when you remove proper nouns, functional words (e.g., "the", "and", prepositions) and, say, the top 1000 most frequent words in English?

3 comments

anon1094 3277 days ago

Would be very interesting to see this applied to blogs in different categories to rapidly learn languages through reading based on the words that you currently know and the most frequent words in that language. So it would always present you with the article that suits your level and you would have the benefit of learning the most new words.

link

_asummers 3277 days ago

Also would be interesting to see it applied to newspapers, with obvious slices like particular author, section (sports v world news etc) distribution year to year, and which paper. TV news broadcasting could also be interesting to compare by the same dimensions, though the conversational style in some interview shows would possibly make this less telling. .

link

kiechu 3277 days ago

That's something worth trying.

link

mannykannot 3277 days ago

I imagine it would look very much like the plots of unique words given in the article. As you suspect, the chances of coming across one of these is much more evenly distributed.

link

kiechu 3277 days ago

It probably would look more or less similar. They are excluded very quickly. There is something I cannot asses: How is the word important to understanding the sequence?

link