Hacker News new | ask | show | jobs
by davidgay 1588 days ago
Without commenting on the overall trend's cause, your diversity hypothesis is bunk and suggests you are looking to making things fit a diversity-related narrative:

- there's (unsurprisingly) no significant diversity-word change from 1900 to 1940 but a very significant distance drop

- there's a big diversity-word change around ~1990 with no concomitant distance change

1 comments

Your comment is a bit ironic in the sense that I can tell you didn't read the article because you reproduce conclusions from the article. That's okay! Obviously you didn't need to read it to know what I would have said. :)

Let me quote from the end:

"Another argument against connecting distance and diversity is that distance is on a long running decline from 1900 even for the first four decades while diversity words were basically flat. When diversity words pop in the 90's there isn't an immediate reaction in cosine distance, it's only about a decade later, in 2000, that cosine distance takes a steep drop."

That seems awfully similar to the two points you've raised here.

What I do find a bit distasteful is that you jump in with "your diversity hypothesis is bunk" and accuse me of trying to fit a narrative - without even reading what you're commenting on.

Hey, I thought your article was nice. First it had an easy intro to word embeddings and cosine similarity. And second, you followed the investigation and even came up with the idea "against connecting distance and diversity", so it didn't seem you had the conclusion before you started the work.

If someone complains about not mentioning variance - it's still implicitly visible by the cloud of dots representing each year around the regression line.