Hacker News new | ask | show | jobs
by uoaei 1507 days ago
Well, they're not supposed to be the same clusters. The reason people develop new methods is to surpass the old ones.

I'm just saying that the method described in the link seems to be exactly what you are describing: using document embedding vectors as input to soft clustering mechanisms akin to LDA. Of course it does not interface perfectly with the theoretical underpinnings of LDA because those are quite constrained to tf-idf (generally count-based) inputs.

As an aside, "TFA" translates to "the fucking article" and is a reference to the classic Internet acronym "RTFM" standing for "read the fucking manual". Both are passive-aggressive-cum-colloquial ways to imply that answers are in places you would expect to find them, if only you go to read the source.

1 comments

i'm pretty sure the method mentioned in the article finds single topic assignments where LDA finds mixtures of topics.

hierarchical in LDA refers to the stacked multinomial nature of the model over word counts, documents and topics.

hierarchical in bertopic means assuming and finding a hierarchical relationship between the topics themselves at cluster time.

they use the same word, but appear very different things, at least to me.