Hacker News new | ask | show | jobs
by haskal 2617 days ago
You wrote Pingtype? I love Pingtype! It is a pinned tab on my browser because I come across Chinese words so often (fellow learner here!)

I think not spacing Chinese is a problem that gets solved through experience. The word groupings will be of 2-3 characters max unlike English where words can be 1-12 letters long (unless it is for nouns like 喜马拉雅 = xi3ma3la1ya3 = Himalaya).

The examples you gave are edge cases but I am sure you can read "butishouldstoptypingnow" just fine. You never though to segment the word as "bu-tish-ould..." because you have an expectation of sentence and type of words that fill positions in the "<conjunction> <subject> <aux verb> <verb>..." format.

1 comments

Wow, really? I have a user?! You seriously just made my day!

I've wasted so much time on that side project, to finally hear that someone else cares about it is so encouraging. Please email me - I've got lots more Pingtype data that I collected and parsed, but didn't upload yet.