Hacker News new | ask | show | jobs
by sampo 4077 days ago
> Another huge reason to use plain text that he didn't mention is version control.

But version control tools are designed for code, i.e. showing which lines have been edited. With English text one would rather want to see which sentences have been edited. Are there tools for this? (Except MS Word's track changes feature.)

Well, one could write one sentence per one line, but that makes a pretty ugly txt document, when viewed raw.

5 comments

> Well, one could write one sentence per one line, but that makes a pretty ugly txt document, when viewed raw.

Many of the tech writers I work with advocate exactly this.

In my stuff, I just hard line wrap the text. Diffs do tend to have more spurious whitespace changes because of this than I'd like, but that's still miles better than a completely opaque binary format like Word.

Not to advocate for word or anything, but technically it's a zip of xml and other stuff (images, etc) that get's pulled in through ... OLE(??). VC + markdown/latex excellent for collaboration or branching drafts.
Once I read "Semantic Linefeeds" (http://rhodesmill.org/brandon/2012/one-sentence-per-line/) I've been experimenting with breaking on punctuation. Yes, it makes the raw text looks a bit odd (check the source on http://boston.conman.org/2015/04/16.1) but I've found it much easier to edit (especially when my girlfriend emails me corrections like spelling errors, typos, incorrect grammar, etc).
For the use case of prose, this is a great alternative to the time investment needed to take up a heavyweight editor (e.g. Emacs or Vim) that can be made to operate on a clause-by-clause, sentence-by-sentence basis, and I recommend it to anyone not interested in taking the plunge into "customization culture" or using the other features those programs provide. My writing, when I don't need to use Word for work (thanks to co-workers who use it for everything), tends to be done in something unobtrusive like nano or sandy[0] and looks much like the source from your second link, minus the HTML.

"Easy to edit," to take a phrase from your first link, is key.

[0]: http://tools.suckless.org/sandy

Not sure what is the right way to do it. But in principle it shouldn't be a problem. An script could make a copy of the files but with one sentence per line. So you could edit the original and then uae the transformed version for version control.
The inquisitive Lt. Function_Seven asked, "How would the script know where one sentence ends and another begins?" as he began typing his query into the Yahoo! Search toolbar.

:) I think you just made the case for bringing back the two spaces after a period rule!

FWIW, basic machine learning approaches to "sentence boundary detection" (as the task is called) get 199 out of 200 of these right (without using the "two space" clue), and have for a while. (e.g., http://sonny.cslu.ohsu.edu/~gormanky/blog/simpler-sentence-b...)
For the purpose of version control, it doesn't even have to be exact. It doesn't matter if the detector inserts an incorrect line break after a certain combination of characters, as long as it does so consistently so that it produces a readable diff.

    Ha.  You might be right.
git diff --word-diff=color

Not exactly sentence-level, but perhaps good enough for some...