| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bquinlan 2455 days ago

I created word cloud as a Valentine's Day gift this year:

- https://raw.githubusercontent.com/brianquinlan/word-cloud-va...

- https://github.com/brianquinlan/word-cloud-valentine/blob/ma...

My implementation (https://github.com/brianquinlan/word-cloud-valentine) is a lot less sophisticated than stylecloud but I think that I had a few interesting ideas about text extraction.

I used nltk to extract only nouns and to do word stemming (e.g. so that "time", "times" and "timing" are only counted as one word).

I also experimented a lot with various method of determining word size i.e. size proportional to frequency, size proportional to log(frequency), size proportional to sqrt(frequency).

1 comments

cjauvin 2455 days ago

It's funny, I did exactly the same, with my Hangouts Takeout extract, a couple of weeks ago, but didn't go as far, because I kept struggling with stopwords and some ways to filter out uninteresting stuff (my implementation was much more naive than yours). I'm still thinking about what other types of analysis I could perform on that interesting dataset though (because it's so personal after all).

link