Hacker News new | ask | show | jobs
by pj_mukh 3230 days ago
This is super cool, I also appreciate the demo's you have on your website (airlines, MBA schools etc.). Makes the end result super clear.

I don't know much about NLP but are you only using unsupervised learning on the raw data? I would think you would need an NLP layer as well that sorts out basic synonymical issues, phrasing differences etc.?

1 comments

Thank you! No training data is required to use Thematic and customers don't need to tell us what they want to find in the data. Hence we say "unsupervised". And yes, we extract synonyms and paraphrases from raw data.

However, we do have tools to review and adjust the results of Thematic by hand. For example, we have an internal drag and drop interface. Some customers really like to change the themes based on their view of the data. But it also helps to remove any inaccuracies, e.g. an incorrectly merged theme.