Hacker News new | ask | show | jobs
by b0b0b0b 4150 days ago
I'm confused by the discussion of multi-lingual corpora. Is it common in topic modeling to consider documents drawn from disjoint vocabularies, or is it just a kind of thought experiment?
1 comments

Pretty common when you don't control the data source or for multi language goverment agencies (for example in Canada you may have your court case in French if you desire).