| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by explosion 4076 days ago
	Another huge reason to use plain text that he didn't mention is version control. With plain text, you can check the files into a git repo and diff any two commits. You then get all the advantages of working with version-controlled code, e.g. commenting on diffs. Word and Google Docs do save revision history, but they slow to a crawl when you have a document with a lot of pages, to the point of being unworkable.

6 comments

nextos 4076 days ago

I just logged in to say this. It's an incredibly powerful feature.

In general, plain text is great because it's explicit. In contrast, word processors store some odd chain of objects. The steps to rebuild a document from scratch that leads to the same representation are not totally obvious. Sometimes it's a bit stochastic even!

XML formats improved the situation, but still plain text wins IMHO. With plain text you can have a toolchain where components for version control, edition and transformation can be replaced seamlessly. New stuff can be introduced at will. In e.g. Word you're mostly locked-in, or at least it's far from trivial to break out.

wcaleb 4076 days ago

Original author here. Totally agree about version control. When I wrote that post I had not yet learned how to use Git. In a follow-up post I discuss how I now manage all of my notes and citations in plain-text on a Pandoc- and git-powered wiki called Gitit.

http://wcm1.web.rice.edu/plain-text-citations.html

kjax 4076 days ago

I appreciate that you mentioned LaTeX and its potential usefulness for these scenarios. You get the ability to version control, without making things especially difficult to translate your text into a more appealing output. Granted, one has to learn typesetting semantics first, but it seems to be a good middle ground.

explosion 4076 days ago

wcaleb, this is great, thanks for both posts. It's also very inspiring to see your open notebook workflow and experiments.

nextos 4076 days ago

There's a very timely HN post discussing a vim implementation on top of emacs: https://news.ycombinator.com/item?id=9394144

As much as I like vi's interface, I think emacs implementation is superior, and org-mode invaluable.

nextos 4076 days ago

Looks great, just watched the video. Have you considered org-mode?

chriswarbo 4076 days ago

I tried org-mode for a while, but found it overly complicated. I use Emacs heavily, but developed a couple of Pandoc extensions which give me the org-mode features I need without all of the baggage: http://chriswarbo.net/essays/activecode/index.html

wcaleb 4076 days ago

Have looked at, but not really considered. I went down the Vim path pretty early on ...

peatmoss 4076 days ago

You can use org-mode with Vim more or less the same as you'd use Markdown if you're planning to compile your doc with Pandoc. You can even use README.org on github in place of README.md, and it will be turned into HTML automagically the same way as Markdown.

Why use org-mode as a format instead of Markdown? Internal cross-references. Org-mode feels just as light weight to me as Markdown for the most part, but it has fewer cases where you can't make things work. Worst case in org-mode, you mix in LaTeX.

CyberpunkDad 4076 days ago

org-mode is decent for LaTex markdown substitution. * = Section = SubSection

Etc.

I think I still prefer to just write it in markdown. Word can't come close to the typesetting ability of LaTex.

marvel_boy 4076 days ago

Looks great, thanks a lot for sharing your workflow !

the_af 4076 days ago

Agreed. This is the same reason why some of us are skeptical of the frequent calls for more "visual" programming languages that eschew textual representations of code.

digi_owl 4076 days ago

Thinking about block based "IDEs"?

I think the biggest difference there is that a document is "static". It is not supposed to process something else like code is.

Thus what you want to avoid at all costs is "boiler plate" bugs. Bugs that are not caught at compile time because it is semantically correct, but that end up producing wild results once a certain kind of data is introduced.

hackuser 4076 days ago

> In contrast, word processors store some odd chain of objects. The steps to rebuild a document from scratch that leads to the same representation are not totally obvious.

Microsoft Word's binary format (.doc) stored the document's text as a block in the file, making data recovery simple.

digi_owl 4076 days ago

I seem to recall older word processors having a "explicit" mode that allowed the editing of markup directly. But i don't know if that is still present, much less used, in the age of "drag and drop".

sampo 4076 days ago

> Another huge reason to use plain text that he didn't mention is version control.

But version control tools are designed for code, i.e. showing which lines have been edited. With English text one would rather want to see which sentences have been edited. Are there tools for this? (Except MS Word's track changes feature.)

Well, one could write one sentence per one line, but that makes a pretty ugly txt document, when viewed raw.

munificent 4076 days ago

> Well, one could write one sentence per one line, but that makes a pretty ugly txt document, when viewed raw.

Many of the tech writers I work with advocate exactly this.

In my stuff, I just hard line wrap the text. Diffs do tend to have more spurious whitespace changes because of this than I'd like, but that's still miles better than a completely opaque binary format like Word.

tomsthumb 4076 days ago

Not to advocate for word or anything, but technically it's a zip of xml and other stuff (images, etc) that get's pulled in through ... OLE(??). VC + markdown/latex excellent for collaboration or branching drafts.

spc476 4076 days ago

Once I read "Semantic Linefeeds" (http://rhodesmill.org/brandon/2012/one-sentence-per-line/) I've been experimenting with breaking on punctuation. Yes, it makes the raw text looks a bit odd (check the source on http://boston.conman.org/2015/04/16.1) but I've found it much easier to edit (especially when my girlfriend emails me corrections like spelling errors, typos, incorrect grammar, etc).

skigg 4076 days ago

For the use case of prose, this is a great alternative to the time investment needed to take up a heavyweight editor (e.g. Emacs or Vim) that can be made to operate on a clause-by-clause, sentence-by-sentence basis, and I recommend it to anyone not interested in taking the plunge into "customization culture" or using the other features those programs provide. My writing, when I don't need to use Word for work (thanks to co-workers who use it for everything), tends to be done in something unobtrusive like nano or sandy[0] and looks much like the source from your second link, minus the HTML.

"Easy to edit," to take a phrase from your first link, is key.

[0]: http://tools.suckless.org/sandy

iamcurious 4076 days ago

Not sure what is the right way to do it. But in principle it shouldn't be a problem. An script could make a copy of the files but with one sentence per line. So you could edit the original and then uae the transformed version for version control.

function_seven 4076 days ago

The inquisitive Lt. Function_Seven asked, "How would the script know where one sentence ends and another begins?" as he began typing his query into the Yahoo! Search toolbar.

:) I think you just made the case for bringing back the two spaces after a period rule!

kylebgorman 4076 days ago

FWIW, basic machine learning approaches to "sentence boundary detection" (as the task is called) get 199 out of 200 of these right (without using the "two space" clue), and have for a while. (e.g., http://sonny.cslu.ohsu.edu/~gormanky/blog/simpler-sentence-b...)

kijin 4076 days ago

For the purpose of version control, it doesn't even have to be exact. It doesn't matter if the detector inserts an incorrect line break after a certain combination of characters, as long as it does so consistently so that it produces a readable diff.

iamcurious 4076 days ago

    Ha.  You might be right.

anoother 4076 days ago

git diff --word-diff=color

Not exactly sentence-level, but perhaps good enough for some...

mattalbie 4076 days ago

http://rhodesmill.org/brandon/2012/one-sentence-per-line/

douche 4076 days ago

This. I still have an svn repository with a bunch of my college history papers in it. I want to think I used line-wrapping in my editor, so each paragraph would end up being a single line. Never had much trouble diffing to check what sentences/phrases I had added between edits.

Probably today I'd want to use Markdown, since I ended up doing something similar anyway to put in my footnotes as I was going. The worst part was always the hour or two of futzing around in Word to format everything for PDF export and printing, since it never came out quite the way it looked on the screen in Word 2003.

gcb0 4076 days ago

convinced my SO to use plain text editor for academic writings with these points:

no crashes

can edit anywhere. even on a dumb phone.

revision control (i provided two scripts to the right click context menu to pull/push)

and the killer one: output to several standards. iso, abnt, eu-whatever... even automatic html formatting for the blog if a smaller article is not accepted anywhere.

though that happened before markdown et al became popular. so latex it is.

dorfsmay 4076 days ago

> can edit anywhere. even on a dumb phone.

This makes me wonder how old you are ;-)

collyw 4075 days ago

Word does have some kind of feature like this does it not? I have seen documents with red changes marked at the side. I never worked out how to use it, but then GIT is hardly intuitive either.

luckydude 4076 days ago

Couldn't agree more. I'm very old school, I still prefer troff over latex but I'll take either over word because of version control.