Hacker News new | ask | show | jobs
by function_seven 4076 days ago
The inquisitive Lt. Function_Seven asked, "How would the script know where one sentence ends and another begins?" as he began typing his query into the Yahoo! Search toolbar.

:) I think you just made the case for bringing back the two spaces after a period rule!

2 comments

FWIW, basic machine learning approaches to "sentence boundary detection" (as the task is called) get 199 out of 200 of these right (without using the "two space" clue), and have for a while. (e.g., http://sonny.cslu.ohsu.edu/~gormanky/blog/simpler-sentence-b...)
For the purpose of version control, it doesn't even have to be exact. It doesn't matter if the detector inserts an incorrect line break after a certain combination of characters, as long as it does so consistently so that it produces a readable diff.

    Ha.  You might be right.