Hacker News new | ask | show | jobs
by chimi 2409 days ago
I came here to say the same thing as the GP. I don't understand why some words are red or green.

For example, you can type in non-brand words as well. I typed in "houses" and the word "homeless" came up in green!

With a brand, facebook, I got this word "amiriteguyze" in red and clicking on it

Negative 11/19/2019, 12:13:31 PM

facebook is bad amiriteguyze?!?!?!?

Why is that even a word that would show up in the word cloud? I can't imagine it was entered a bunch of times. I can't intuit any correlation between the colors, sizes, or words themselves that show up in the clouds.

1 comments

The algorithm will try to give more importance to words which appear rarely and are only used with the chosen brandname (similar to TF-IDF). This is why sometimes weird words can surface to the wordcloud, especially when the sample size of messages is small.

To prevent those words from appearing, I was thinking to implement some dictionary-check to only allow for meaningful words. However this approach also have drawback as you restrict people's words and can miss important new concepts.

Thanks for the feedback.