Hacker News new | ask | show | jobs
by iamcurious 4076 days ago
Not sure what is the right way to do it. But in principle it shouldn't be a problem. An script could make a copy of the files but with one sentence per line. So you could edit the original and then uae the transformed version for version control.
1 comments

The inquisitive Lt. Function_Seven asked, "How would the script know where one sentence ends and another begins?" as he began typing his query into the Yahoo! Search toolbar.

:) I think you just made the case for bringing back the two spaces after a period rule!

FWIW, basic machine learning approaches to "sentence boundary detection" (as the task is called) get 199 out of 200 of these right (without using the "two space" clue), and have for a while. (e.g., http://sonny.cslu.ohsu.edu/~gormanky/blog/simpler-sentence-b...)
For the purpose of version control, it doesn't even have to be exact. It doesn't matter if the detector inserts an incorrect line break after a certain combination of characters, as long as it does so consistently so that it produces a readable diff.

    Ha.  You might be right.