Hacker News new | ask | show | jobs
by kookamamie 411 days ago
> The Python plaintext library

Then, goes on to present a convoluted node-based system for managing content. Why call it "plaintext" when it clearly has nothing to do with it? Perhaps describing it as some kind of a Markdown alternative, i.e. a markup-language would make more sense?

2 comments

"There are only two hard things in Computer Science: cache invalidation and naming things"
"... and off-by-one errors"
thats just off by one
markdown and markup languages are plaintext
By that logic, aren't almost all languages plaintext?
You can even go a step farther; arguably, in a Unicode era, there is no such thing as "plain text" anymore. Nominally "plain text" has always had markup in it, such as newlines, tabs, and so forth, but ok, sure, we can incorporate those things into what may be slightly misnamed but is a "simplest possible format" that had a lot of useful characteristics, like, it mostly fit into a grid (except tabs), it naturally fit with a monospace font, it could be dumped to terminal, it often had only 7-bit ASCII characters in it and if it was 8-bit the encoding was externally specified somehow, it is monochrome, etc. etc. There are some ways this strains the concept of "plain text" but when it wasn't a moving target for a couple of decades (modulo perhaps some 8-bit encoding issues) at least English speakers could pretty much agree on what "plain text" meant.

But even "plain text" unicode now breaks a huge number of those assumptions. A number of Unicode characters have defined widths, like all the spaces. Kanji is broadly defined to be twice the width of an English character in a monospaced font, and that's subject to a number of exceptions too. There are markup characters like Right-To-Left, Left-To-Right, and the Arabic Letter Mark. Emoji are not exactly intrinscally non-monochrome, but aren't exactly intrinsically monochrome either (your users may have some objections). Zero-width joiners have complicated semantics that go well beyond just "a zero-width space". You have to handle composite characters e + acute in addition to the e-acute itself, and you have to render arbitrary numbers of them to even remotely properly handle Zalgo text. You have to worry about font glyph support in a way that you didn't in a 256-character world.

Even text with no markup has mandatory markups in it now; they may not be "bold" or "italic" markup but if you want to even remotely properly render them the minimal code necessary to do so is rather more complicated than a minimal bold/italic support. Unicode doesn't really have that "we all agree on the defaults so we can just dump it to the screen and do the simplest possible render and we'll all agree that's what it should look like" anymore.

Yes, this is nonsensical or contradictory because GP is not aware of the correct definition.

https://news.ycombinator.com/item?id=28105868

> Plain text: text without mark-up

What is the utility of such a definition? As far as I am concerned, anything I can read with my editor is plain text. That definition is trivially useful on a daily basis. I don't see any point in calling markdown something other than plain text. Because it's just plain text.

And of course, I intend deep disrespect that you had the gall to claim correctness for such obviously arbitrary definitions.

Yep, I call it the notepad/nano rule.

If I can open your file in a notepad and effectively edit it without the format changing or getting corrupted at saving its plain text.

Not plain text: Attempting to edit an .exe file in notepad.

Unfriendly plain text: Minimized javascript where the entire file exists on one line and the human readable elements jumbled together.

Plain text: Your average source code file/html file that attempts to adhere to something around 80-120 columns of text.

Both definitions are correct and are regularly used.

Personally, I find 'human readable' to be a better term for your definition and use 'plaintext' to mean either unformatted text (except perhaps with whitespace), or the non-markup text within a marked up document.

Wiktionary suggests that the divide is contextual, with your definition being the 'file format' definition and GP's definition being the 'computing' definition.[0]

[0] - https://en.wiktionary.org/wiki/plain_text

To me it's something like "the target language does not differ from the expressing language"?

A .txt file for notes is plaintext, because the language I'm using doesn't have to be compiled for my goal. Programming languages are not, because the expressed language is compiled into some other target language (machine code).

Markdown is not, because it's compiled into HTML.

A .txt undergoes no transformations from my writing, to its storage, to my later usage of it.

As opposed to what? A pdf?