Hacker News new | ask | show | jobs
by inteoryx 1588 days ago
I had a similar intuition and graphed unique words per year while writing this. I found, surprisingly, that actually the reverse was true. Unique words per year go up, even as diversity goes down. Another finding that may explain this is that the articles get longer as time goes on - so a simple unique word count may just increase as a function of the authors using more words. There is a period in the late 80's to early 90's where average word counts per article nearly double. I'd speculate that this is about the time The Crimson switched to using computers or good word processing or something that made writing articles easier.

A graph that may get to the heart of your question is something like "Unique word percentage over time" or maybe "What percentage of articles use unique words".

2 comments

> Unique words per year go up, even as diversity goes down. Another finding that may explain this is that the articles get longer as time goes on - so a simple unique word count may just increase as a function of the authors using more words.

You'd probably see the same effect if the articles don't get any longer, but more of them get written every year. Unique word count will always go up with words produced, even if most words produced are formulaic boilerplate.

One nice thing about getting feedback is learning all of the additional stuff I should have included in the blog post. I did look at number of articles per year, and it fluctuates, but there isn't a huge change across the century, and the change goes up and down. Total words, on the other hand, does trend up and goes up faster more recently.
Maybe you should just crop the same number of words from the start of each article, and take the same number of articles from each year by random sampling. That would make things easier to compare.