|
|
|
|
|
by datalink
4139 days ago
|
|
I think you're mixing up what topics are. The actual topics as generated by LDA are the concatenated word lists (actually distributions of all words in the corpus, of which i concatenate the top 8 words to generate a meaningful descriptor of the topic). So server-client-http-request-service-ruby-connection-user is one topic / word distribution, in which "ruby" happens to be 6th most probable word, likely because it appears a lot in posts on servers, web services etc. It does not mean ruby the word itself is classified to be server related. Same applies to the other examples you gave. The categories/domains I simply assigned manually, to show how one could possibly interpret these word distributions that LDA generated. |
|