Hacker News new | ask | show | jobs
by carom 494 days ago
Did you find the library jieba? That is what I am using for segmentation. It seems to work fine on simplified despite not advertising it.
1 comments

I did! Jieba is the first step in my segmentation pipeline. As far as I can tell, Jieba's default config tends to work better for simplified, but in my case the custom dictionary I feed it has significantly more traditional entries than simplified entries, especially for historical terms and slang.