Hacker News new | ask | show | jobs
by explosion 4076 days ago
Another huge reason to use plain text that he didn't mention is version control. With plain text, you can check the files into a git repo and diff any two commits. You then get all the advantages of working with version-controlled code, e.g. commenting on diffs.

Word and Google Docs do save revision history, but they slow to a crawl when you have a document with a lot of pages, to the point of being unworkable.

6 comments

I just logged in to say this. It's an incredibly powerful feature.

In general, plain text is great because it's explicit. In contrast, word processors store some odd chain of objects. The steps to rebuild a document from scratch that leads to the same representation are not totally obvious. Sometimes it's a bit stochastic even!

XML formats improved the situation, but still plain text wins IMHO. With plain text you can have a toolchain where components for version control, edition and transformation can be replaced seamlessly. New stuff can be introduced at will. In e.g. Word you're mostly locked-in, or at least it's far from trivial to break out.

Original author here. Totally agree about version control. When I wrote that post I had not yet learned how to use Git. In a follow-up post I discuss how I now manage all of my notes and citations in plain-text on a Pandoc- and git-powered wiki called Gitit.

http://wcm1.web.rice.edu/plain-text-citations.html

I appreciate that you mentioned LaTeX and its potential usefulness for these scenarios. You get the ability to version control, without making things especially difficult to translate your text into a more appealing output. Granted, one has to learn typesetting semantics first, but it seems to be a good middle ground.
wcaleb, this is great, thanks for both posts. It's also very inspiring to see your open notebook workflow and experiments.
There's a very timely HN post discussing a vim implementation on top of emacs: https://news.ycombinator.com/item?id=9394144

As much as I like vi's interface, I think emacs implementation is superior, and org-mode invaluable.

Looks great, just watched the video. Have you considered org-mode?
I tried org-mode for a while, but found it overly complicated. I use Emacs heavily, but developed a couple of Pandoc extensions which give me the org-mode features I need without all of the baggage: http://chriswarbo.net/essays/activecode/index.html
Have looked at, but not really considered. I went down the Vim path pretty early on ...
You can use org-mode with Vim more or less the same as you'd use Markdown if you're planning to compile your doc with Pandoc. You can even use README.org on github in place of README.md, and it will be turned into HTML automagically the same way as Markdown.

Why use org-mode as a format instead of Markdown? Internal cross-references. Org-mode feels just as light weight to me as Markdown for the most part, but it has fewer cases where you can't make things work. Worst case in org-mode, you mix in LaTeX.

org-mode is decent for LaTex markdown substitution. * = Section = SubSection

Etc.

I think I still prefer to just write it in markdown. Word can't come close to the typesetting ability of LaTex.

Looks great, thanks a lot for sharing your workflow !
Agreed. This is the same reason why some of us are skeptical of the frequent calls for more "visual" programming languages that eschew textual representations of code.
Thinking about block based "IDEs"?

I think the biggest difference there is that a document is "static". It is not supposed to process something else like code is.

Thus what you want to avoid at all costs is "boiler plate" bugs. Bugs that are not caught at compile time because it is semantically correct, but that end up producing wild results once a certain kind of data is introduced.

> In contrast, word processors store some odd chain of objects. The steps to rebuild a document from scratch that leads to the same representation are not totally obvious.

Microsoft Word's binary format (.doc) stored the document's text as a block in the file, making data recovery simple.

I seem to recall older word processors having a "explicit" mode that allowed the editing of markup directly. But i don't know if that is still present, much less used, in the age of "drag and drop".
> Another huge reason to use plain text that he didn't mention is version control.

But version control tools are designed for code, i.e. showing which lines have been edited. With English text one would rather want to see which sentences have been edited. Are there tools for this? (Except MS Word's track changes feature.)

Well, one could write one sentence per one line, but that makes a pretty ugly txt document, when viewed raw.

> Well, one could write one sentence per one line, but that makes a pretty ugly txt document, when viewed raw.

Many of the tech writers I work with advocate exactly this.

In my stuff, I just hard line wrap the text. Diffs do tend to have more spurious whitespace changes because of this than I'd like, but that's still miles better than a completely opaque binary format like Word.

Not to advocate for word or anything, but technically it's a zip of xml and other stuff (images, etc) that get's pulled in through ... OLE(??). VC + markdown/latex excellent for collaboration or branching drafts.
Once I read "Semantic Linefeeds" (http://rhodesmill.org/brandon/2012/one-sentence-per-line/) I've been experimenting with breaking on punctuation. Yes, it makes the raw text looks a bit odd (check the source on http://boston.conman.org/2015/04/16.1) but I've found it much easier to edit (especially when my girlfriend emails me corrections like spelling errors, typos, incorrect grammar, etc).
For the use case of prose, this is a great alternative to the time investment needed to take up a heavyweight editor (e.g. Emacs or Vim) that can be made to operate on a clause-by-clause, sentence-by-sentence basis, and I recommend it to anyone not interested in taking the plunge into "customization culture" or using the other features those programs provide. My writing, when I don't need to use Word for work (thanks to co-workers who use it for everything), tends to be done in something unobtrusive like nano or sandy[0] and looks much like the source from your second link, minus the HTML.

"Easy to edit," to take a phrase from your first link, is key.

[0]: http://tools.suckless.org/sandy

Not sure what is the right way to do it. But in principle it shouldn't be a problem. An script could make a copy of the files but with one sentence per line. So you could edit the original and then uae the transformed version for version control.
The inquisitive Lt. Function_Seven asked, "How would the script know where one sentence ends and another begins?" as he began typing his query into the Yahoo! Search toolbar.

:) I think you just made the case for bringing back the two spaces after a period rule!

FWIW, basic machine learning approaches to "sentence boundary detection" (as the task is called) get 199 out of 200 of these right (without using the "two space" clue), and have for a while. (e.g., http://sonny.cslu.ohsu.edu/~gormanky/blog/simpler-sentence-b...)
For the purpose of version control, it doesn't even have to be exact. It doesn't matter if the detector inserts an incorrect line break after a certain combination of characters, as long as it does so consistently so that it produces a readable diff.

    Ha.  You might be right.
git diff --word-diff=color

Not exactly sentence-level, but perhaps good enough for some...

This. I still have an svn repository with a bunch of my college history papers in it. I want to think I used line-wrapping in my editor, so each paragraph would end up being a single line. Never had much trouble diffing to check what sentences/phrases I had added between edits.

Probably today I'd want to use Markdown, since I ended up doing something similar anyway to put in my footnotes as I was going. The worst part was always the hour or two of futzing around in Word to format everything for PDF export and printing, since it never came out quite the way it looked on the screen in Word 2003.

convinced my SO to use plain text editor for academic writings with these points:

no crashes

can edit anywhere. even on a dumb phone.

revision control (i provided two scripts to the right click context menu to pull/push)

and the killer one: output to several standards. iso, abnt, eu-whatever... even automatic html formatting for the blog if a smaller article is not accepted anywhere.

though that happened before markdown et al became popular. so latex it is.

> can edit anywhere. even on a dumb phone.

This makes me wonder how old you are ;-)

Word does have some kind of feature like this does it not? I have seen documents with red changes marked at the side. I never worked out how to use it, but then GIT is hardly intuitive either.
Couldn't agree more. I'm very old school, I still prefer troff over latex but I'll take either over word because of version control.