Hacker News new | ask | show | jobs
by Singletoned 1574 days ago
I hate to be pedantic[1], but:

> HTML, Markdown, JSON, LaTeX, and many other standard formats, are just plain text.

On this definition, Word and Excel are just (zipped) plain text files.

> Every device, including ones long gone, and ones not invented yet, can read and edit plain text.

This definitely isn't true, and it kind of misses the point that there's no such thing as "plain text". It's still encoded in ascii, or utf-8, and still potentially has problems being read on other machines.

It's reasonable to say that ascii has become so ubiquitous as to be universal, but it definitely wasn't always so, and won't definitely always be.

[1] Okay, I love to be pedantic

5 comments

You can't be pedantic and then say Word and Excel are files. They're applications.

On a more serious note, ascii and nowadays utf8 are customarily considered plain text, the fact that a specific charset is used doesn't mean it's not text.

> that there's no such thing as "plain text"

Please show me a computing device that cannot deal with ASCII.

And UTF-8 has, by now, reached a level of ubiquity that encompasses almost everything in IT as well.

indeed, why not use real plain text – not markdown.

Homer was able to write the fall of troy and Shakespeare Hamlet without bold text.

So what can't one express without?

If anyone in this thread has a talent for writing that compares to either Homer or Shakespeare, then we're in excellent company!

The rest of us, who can't necessarily achieve the desired effect through words alone, need typographic assistance — much like how I would need a bicycle to keep up with Haile Gebrselassie.

Markdown is "real plain text".

Why not just use letters and avoid punctuation ... that's all Markdown is: punctuation.

> Homer was able to write the fall of troy and Shakespeare Hamlet without bold text.

Oh really? Have you ever seen a manuscript?

Unlike handwritten text, plain text doesn't provide a means to underline, use italics, subscripts and superscripts, etc. Things like Markdown provide conventions for denoting such things in plain text.

> without bold text.

Are you sure they didn't use bold text, though?

maybe a thicker quill from a larger goose?
No, you just hold it a little differently. Twist it sideways, like.
There's still EBCDIC systems being used.
I'll just quote my own answer from elsewhere in this thread:

    Even if such a device needs to be used, decoding ascii is a trivial lookup operation, 
    not remotely comparable to decoding some arcane binary format, or a convoluted XML-
    derived format such as they are used in WYSIWYG editor formats.
The Commodore 64 I’m building could probably be taught how to read ASCII with enough effort, but out of the box it can’t.

I appreciate modern computers conform to standards, but that doesn’t mean that these standards have always existed, or will always exist.

> but out of the box it can’t.

Even if such a device needs to be used, decoding ascii is a trivial lookup operation, not remotely comparable to decoding some arcane binary format, or a convoluted XML-derived format such as they are used in WYSIWYG editor formats.

Yes, text relies on an encoding standard. So do numbers btw. (big/little endian, 2s/1s complement, sign/magnitude, floating-point representations), element enumeration (0 vs 1 based indexing) and even boolean logic (eg.: 0 is true in bash, everything else is false)

At the end of the day, computers represent only 2 states: On and Off. Everything beyond that, needs an encoding.

And some of these encodings are, at this point, both so universal and simple, that they can be considered as much a standard of the IT world, as 0 and 1. ASCII is one of those.

A Commodore 64 can certainly read ASCII out of the box since ASCII is just data ... I'm not sure that person purportedly building a Commodore 64 understands how computers work. And the Commodore 64 display hardware expects PETSCII, which is based on ASCII-1963 ... a few characters will be displayed incorrectly, but alphanumerics will be fine.
ASCII text is just byte data ... nothing has to be done to "teach" a Commodore 64 to read ASCII. Displaying it on the screen is slightly problematic because the Commodore 64 uses PETSCII, which is an older version of ASCII, so some characters will be displayed differently, but it's not a terribly big deal.

> that doesn’t mean that these standards have always existed, or will always exist

These are irrelevant "points". ASCII will still be in use when human civilization ends soon (possibly next week due to nuclear war, or else in a few hundred years due to global warming).

Older Word (doc) and Excel (xls) files aren't "zipped" and aren't plain text files.

> This definitely isn't true

Yes, actually it is.

> and it kind of misses the point that there's no such thing as "plain text".

Who here made such a point? Anyway, that's not true either.

> It's still encoded in ascii, or utf-8

So, plain text files.

> and still potentially has problems being read on other machines.

What "other machines"? What problems? What matters is the software, not "machines".

> It's reasonable to say that ascii has become so ubiquitous as to be universal, but it definitely wasn't always so

I was alive when EBCDIC was common, but that isn't relevant.

> and won't definitely always be.

Sure, there's the heat death of the universe eventually.

When plain text is zipped, it is no longer plain text and as such it does not take advantage of all the things that can be done with simple text files, unless there is additional tooling, which again unzips the zipped files. This creates some friction of course. General tools do not bother with implementing a knowledge about every format on the planet, so those zips stay zips and are treated as bninary data by version control, which makes them not too useful.
> On this definition, Word and Excel are just (zipped) plain text files.

but this is true only for newer versions. In older versions it was binary salad derived from C's data model.

And even in new versions these are far from files you can safely edit by hand.

doc and xls files have nothing to do with "C's data model".