Hacker News new | ask | show | jobs
by jonathanbgn 2409 days ago
The algorithm will try to give more importance to words which appear rarely and are only used with the chosen brandname (similar to TF-IDF). This is why sometimes weird words can surface to the wordcloud, especially when the sample size of messages is small.

To prevent those words from appearing, I was thinking to implement some dictionary-check to only allow for meaningful words. However this approach also have drawback as you restrict people's words and can miss important new concepts.

Thanks for the feedback.