Hacker News new | ask | show | jobs
by maxerickson 2387 days ago
One of your points is that an encoding designed to handle languages has support for more than one kind of white space. Given that languages use more than one kind of white space, this is sort of a necessity.

Another one is that a standard designed to support all languages has a feature necessary for supporting some languages.

Those aren't inconsistencies, so do feel free to go on.

1 comments

One of your points is that an encoding designed to handle languages has support for more than one kind of white space.

No. It is that there is more than one kind of invisible character. No language has invisible characters.

Another one is that a standard designed to support all languages has a feature necessary for supporting some languages.

Not sure what point you are misreading here. But that was not among my points.

You said "But there are whole languages (particularly from the Indian subcontinent) that cannot be written without combining characters."

I suppose I didn't consider that they could be written without combining characters given a different design.

As far as invisible characters, I'm not interested in arguing about it. English, as written, has all sorts of different structural uses of white space, it isn't all just style.

I suppose I didn't consider that they could be written without combining characters given a different design.

They could be.

Likewise European languages can be written without precombined characters. The fact that é can be written in multiple ways was my point.

As far as invisible characters, I'm not interested in arguing about it. English, as written, has all sorts of different structural uses of white space, it isn't all just style.

You still don't understand. I am not talking about whitespace. I am talking about invisible zero-width characters that can be slipped into text with no sign that they are there. Characters like U+180E, U+200B, U+FEFF, U+200C, U+200D, and U+FEFF. Not to mention that you can achieve the same thing with control characters like U+200FU+200E. (The undetectability of the last one is language dependent.)

As I said, this can be used to invisibly sign a document. But I don't see any other particular point to having so many ways to accomplish what looks like nothing.