|
|
|
|
|
by berto4
1507 days ago
|
|
yeah exactly my question. LDA is probabilistic and very performant if you clean up the documents well. The approach using Bert seems pretty powerful given that you can now cluster based on semantics, not just word occurrence/frequencies as in LDA (though ngrams help). However using a clustering approach would mean that each document is a part of a single topic, rather than being made up of multiple topics. But this is a cool idea nonetheless.
[EDIT] quickly checked it out, seems like it uses some kind of soft clustering so documents can occur in many clusters (topics) |
|