Hacker News new | ask | show | jobs
by routerl 495 days ago
I did! Jieba is the first step in my segmentation pipeline. As far as I can tell, Jieba's default config tends to work better for simplified, but in my case the custom dictionary I feed it has significantly more traditional entries than simplified entries, especially for historical terms and slang.