|
|
|
|
|
by haskal
2617 days ago
|
|
You wrote Pingtype? I love Pingtype! It is a pinned tab on my browser because I come across Chinese words so often (fellow learner here!) I think not spacing Chinese is a problem that gets solved through experience. The word groupings will be of 2-3 characters max unlike English where words can be 1-12 letters long (unless it is for nouns like 喜马拉雅 = xi3ma3la1ya3 = Himalaya). The examples you gave are edge cases but I am sure you can read "butishouldstoptypingnow" just fine. You never though to segment the word as "bu-tish-ould..." because you have an expectation of sentence and type of words that fill positions in the "<conjunction> <subject> <aux verb> <verb>..." format. |
|
I've wasted so much time on that side project, to finally hear that someone else cares about it is so encouraging. Please email me - I've got lots more Pingtype data that I collected and parsed, but didn't upload yet.