Hacker News new | ask | show | jobs
by d-us-vb 60 days ago
I feel like that isn’t exactly a very useful definition of plaintext. If you mean “ASCII” say ASCII.

Plain text is text intended to be interpreted as bytes that map simply to characters. Complexity is irrelevant.

2 comments

https://en.wikipedia.org/wiki/Plaintext

  With the advent of computing, the term plaintext expanded beyond human-readable documents to mean any data, including binary files, in a form that can be viewed or used without requiring a key or other decryption device. Information—a message, document, file, etc.—if to be communicated or stored in an unencrypted form is referred to as plaintext.
https://csrc.nist.gov/glossary/term/plaintext

    Unencrypted information that may be input to an encryption operation. Note: Plain text is not a synonym for clear text. See clear text.

    Intelligible data that has meaning and can be understood without the application of decryption.
Unfortunately no, Unicode is not simply a mapping of bytes to characters. It is a mapping of numbers to code points, and in some cases you can even get the same characters with multiple code point sequences (not a very good mapping!). Then you need to convert numbers to bytes, so aside from Unicode you also need an encoding. And there are multiple choices. So what would be "plain text" then? UTF-16? UTF-8? If so, with or without BOM? It can't be all of them. For something to really be "plain text" it has to be the same thing to everyone...
> Unfortunately no, Unicode is not simply a mapping of bytes to characters. It is a mapping of numbers to code points, and in some cases you can even get the same characters with multiple code point sequences (not a very good mapping!).

It is worse than that; you can also get different characters with the same code points, and also same code points and characters that should be different according to some uses, and also different code points and characters that should be same according to some uses, etc.