Hacker News new | ask | show | jobs
by reubenbond 1763 days ago
The Opus OpenSubtitles corpus was very useful when I was creating this Chinese-English dictionary app: https://github.com/ReubenBond/HanBaoBao. The tool which creates the dictionary database aggregates several sources, including processing Chinese subtitles for word frequency to inform the most likely cuts when performing word segmentation.